RDF for XML Enthusiasts

I’ve been learning RDF and SPARQL for a while now, and recently I had the chance to talk to a number of MarkLogic customers who are using RDF with or alongside MarkLogic. One of the results of that is this presentation, which I gave at XML Prague this weekend.

On determining itemtype-subtype() for recursive types

There were questions as to whether type checking using a lenient recursive type alias like I proposed in my previous post would terminate. I’ve been doing some research - so if like me you struggle to know the correct academic term to use to search for information, here is a handy guide:

Equirecursive types: The kind of type recursion we’re talking about for XSLT 3.0. More formally, a type system where A is considered the same type as B, given:

<xsl:type-alias name="A" as="map(xs:string, A)"/>
<xsl:type-alias name="B" as="map(xs:string, map(xs:string, B))"/>

In other words, a recursive type is considered equal to it’s unrolling by one level.

Isorecursive types: Recursive types that provide new type annotations. An unrolled isorecursive type is not considered equal to the original type. These are apparently considered simpler to reason about than equirecursive types.

Unification: An algorithm that allows you to check equality of two types (well, directed graphs actually). If implemented correctly, it handles graphs with cycles (recursive types), as well as graphs with variables (parametric types). Incidently the same algorithm an be used for both pattern matching and type inference.

My source material for the algorithm comes from the "Dragon Book", section 6.5.5. The unification algorithm for ItemType equality would look something like this:

1) Express the types as directed graphs with type aliases expanded (I’ve numbered the nodes so I can refer to them later):

<!-- Slightly different defintion of B this time -->
<xsl:type-alias name="A" as="map(xs:string, A)"/>
<xsl:type-alias name="B" as="map(xs:string, map(xs:NMTOKEN, B))"/>
A:         map[1]-+
            /\    |
           /  ----+
B:         map[3]-----+
            /\        |
           /  \       |
xs:string[4]   map[5] |
               /\     |
              /  -----+

2) The basic idea is to start with two nodes from the graphs, M and N, that you want to assert are equal. The algorithm keeps track of sets of nodes that must be equal for this assertion to be true, called equivalence classes. Equivalence classes are represented by a single node picked from that set.

3) find(n) returns the node representing the equivalence class for the node n. Each node starts off in it’s own equivalence class, and these classes get merged successively by the union(m,n) procedure.

4) The algorithm proceeds:

boolean unify(M, N) {
   S = find(M), T = find(N);
   if(S = T) return true;
   if(S and T have the same node label and the same number of children) {
     union(S, T);
     return every child s in S, child t in T satisfies unify(s, t)
   return false;

5) The algorithm terminates because the number of equivalence classes is reduced by 1 for each recursion. Finally, we can extend the notion of unification to use equivalence classes based on itemtype-subtype() rules. We only need to make sure that every node in the equivalence class is a subtype of the representative node. By doing so, our unification algorithm verifies a subtype relationship rather than an equality relationship.

A working of the itemtype-subtype() algorithm for the types I named A and B above:

i) Equivalence classes: (map[1]), (xs:string[2]), (map[3]), (xs:string[4]), (map[5]), (xs:NMTOKEN[6])

unify(map[1], map[3])
   - S = map[1], T = map[3]
   - S and T are matching non-leaf nodes
   - union(S, T)
   - recursively check the children

ii) Equivalence classes: (map[1], map[3]), (xs:string[2]), (xs:string[4]), (map[5]), (xs:NMTOKEN[6])

unify(xs:string[2], xs:string[4])
   - S = xs:string[2], T = xs:string[4]
   - S is a subtype of T
   - union(S, T)
   - return true (no children to check)

iii) Equivalence classes: (map[1], map[3]), (xs:string[2], xs:string[4]), (map[5]), (xs:NMTOKEN[6])

unify(map[1], map[5])
   - S = map[1], T = map[5]
   - S and T are matching non-leaf nodes
   - union(S, T)
   - recursively check the children

iv) Equivalence classes: (map[1], map[3], map[5]), (xs:string[2], xs:string[4]), (xs:NMTOKEN[6])

unify(xs:string[2], xs:NMTOKEN[6])
   - S = xs:string[2], T = xs:NMTOKEN[6]
   - S is a subtype of T
   - union(S, T)
   - return true (no children to check)

v) Equivalence classes: (map[1], map[3], map[5]), (xs:string[2], xs:string[4], xs:NMTOKEN[6])

unify(map[1], map[3])
   - S = map[1], T = map[1]
   - S = T
   - return true

And we successfully determine that the recursive ItemType A is a subtype of the recursive ItemType B.

Proposal to add Type Aliases to XSLT 3.0

Type Alias Declaration

<!-- Category: declaration -->
   name = qname
   as = item-type
   visibility? = "public" | "private" />

[DEFINITION: A type alias declaration adds a binding to the static context, between an expanded-QName and an ItemType, called the aliased type. The aliased type is specified in the mandatory “as” attribute.]

[ERR TBD] A type alias name must be in a namespace, and must not be the same as the name of any type in the in-scope schema components, or any other type alias name.

Type aliases are imported and included from stylesheet modules, and may be exported by a package where they can be given visibilities of “public” and “private”. NOTE: This proposal does not provide a way for type aliases to be overridden.

[DEFINITION: A type alias may be used as a SequenceType by using a type alias reference, which is a SequenceType consisting only of a bare EQName (which is parsed as an ItemType by the AtomicOrUnionType grammar production in XPath 3.0) resolving to an expanded-QName equal to the type alias name.] A type alias reference behaves exactly as if the aliased type had been used in it’s place. A type alias is in scope for every stylesheet module in the stylesheet, and may be used anywhere an ItemType is expected. NOTE: Type aliases do not provide new type annotations with which values are annotated - in this way they do not define new types, and are true to the name “alias”.

NOTE: This does not permit the use of a type alias in a number of places where an atomic type name is valid, including a cast expression and as the TypeName in an element test. [DEFINITION: A type alias A trivially depends on a type alias B if A’s aliased type is a type alias reference to B, or to a type alias C that itself trivially depends on B.] [ERR TBD] It is an error for a type alias to trivially depend on itself.

NOTE: This proposal does not extend to defining parametrized type aliases, but does not preclude adding them in the future. Examples:
<xsl:type-alias name="my:complexNumber"
   as="map(xs:boolean, xs:double)"/>

A type alias for a complex number implemented as a map from booleans to doubles.

<xsl:type-alias name="my:predicate"
   as="function(item()) as xs:boolean"/>

A type alias for a function which can be used as a predicate.

<xsl:type-alias name="my:tree" as="map(xs:string,my:tree)"/>

A recursive type alias for an edge labelled tree structure implemented using maps.

Other Changes

Add a paragraph to the end of Section 22.1.3 of XSLT 3.0:

If a new type is added to the type system, implementations may denote that type in the SequenceType syntax using a bare EQName in a (non-null) implementation specific namespace - which is parsed as an ItemType by the AtomicOrUnionType grammar production in XPath 3.0.


Ant Classpath Woes

When building the specification documents for the W3C XQuery and XSLT working groups, I’m constantly finding I have to battle against Java’s over-engineered dependency injection based XSLT APIs.

I could rant about how an XSLT 1.0 processor won’t do when I’ve got XSLT 2.0 stylesheets - indeed generally speaking I choose my libraries carefully because I want to use exactly those ones, not some others that happens to be hanging around.

Anyhow, this isn’t meant to be that rant. This is just to document the solution to my problems. It seems that the shell script that launches ant likes to add it’s dependencies to the classpath, despite my wishes to the contrary. The way to force it to my will is to use the CLASSPATH_OVERRIDE environment variable, thus: CLASSPATH_OVERRIDE=1 CLASSPATH=../../lib/xercesImpl.jar:../../lib/saxon9.jar:../../lib/saxon.jar ant

Sweet joy!

Australian Humour

A real email my sister received from Townsville Freecycle! Sent: Wednesday, 2 February 2011 12:39 PM
Subject: [townsville_freecycle] OFFER - Tropical Cyclone YASI (Category 5) - Townsville

No longer wanted, to large to fit anywhere. To be taken away ASAP please. Thanks in advance.
Ice Crystals in Wytham WoodsPosted from:  Oxfordshire County OX2, UK

Ice Crystals in Wytham Woods

Posted from: Oxfordshire County OX2, UK

Interesting use case for the XQuery 1.1 switch expression

XQuery 1.1 introduces a new switch expression, which was intended for matching an expression against a number of different possible values.

However, it can also be used instead of doing a series of nested if expressions, like this: switch(true())
case $a > 0 return “positive”
case $a < 0 return “negative”
default return “zero”

I think that could become a common usage pattern.

Transferring websites to my iPhone with QR codes

I often want to transfer a website I’ve been reading from my desktop to my iPhone so I can read it elsewhere. The solution I’ve come up with is the following javascript bookmark to generate a QR code for the current website I’m browsing: javascript:var%20url%20=%20””%20+%20encodeURIComponent(window.location);window.location%20=%20url;

Saving that to a bookmark on your browser’s toolbar, and you can easily create a QR code for any website. You can then use iPhone apps like ZBar to read the QR code direct from your monitor and open the webpage.
Definitely the best toad-in-the-hole I&#8217;ve ever cooked.

Definitely the best toad-in-the-hole I’ve ever cooked.