matb33.me

Mathieu Bouchard

xPath lang() function broken?

From all the research and testing I’ve done of the xPath lang() function, I can only conclude that it is broken. Its behavior does not make sense to me. Perhaps my expectation of its functionality is wrong because I’m using it in an unintended way.

Allow me to exemplify:

<p xml:lang="en">The quick brown fox jumps over the lazy dog.</p>
<p xml:lang="en-GB">What colour is it?</p>
<p xml:lang="en-US">What color is it?</p>

I found this example snippet it in a few places on the web, and I believe it originally comes from a W3C example.

Here’s an example XSLT template I could run against the above XML to filter it down to my current language/locale:

<xsl:template match="p[ lang( $lang ) ]">
    <xsl:copy-of select="." />
</xsl:template>

Scenario 1: $lang = 'en'

This will render an odd output (correctly) because all <p> tags will pass the lang() function test. The output would look like this:

The quick brown fox jumps over the lazy dog.
What colour is it?
What color is it?
  • en matched en (correctly)
  • en matched en-US (correctly)
  • en matched en-GB (correctly)

Scenario 2: $lang = 'en-US'

This is where I personally feel lang() handles the situation incorrectly. The output would look like this:

What color is it?

So... what happened to "The quick brown fox jumps over the lazy dog."?

  • en-US did not match en (incorrectly)
  • en-US did not match en-GB (correctly)
  • en-US matched en-US (correctly)

lang() says that en-US is not part of en, whereas I believe it is. To me, en-US is a part of the general English language (en), a sub-set of English the core language, so it should match! Ultimately, the result I want to see if I set $lang to en-US is:

The quick brown fox jumps over the lazy dog.
What color is it?

I have a feeling I’m simply not using lang() in the way it was intended. But does that mean I'm wrong about my interpretation?

[reproduced and edited from a blog post of old I had written at McMillan]