Notation3: The Great QName Survey

Summary: This is a survey of the inconsistencies between the implementations of the Notation3 grammar, using the QName production as an example.

The following are relevant pieces excerpted from 8/9 Notation3 grammars and parsers. The /DesignIssues/Notation3 BNF is usually taken as definitive (but is broken). A conclusion and recommendation on the actual QName production follows the various lists.

/DesignIssues/Notation3
   alpha = [A-Za-z]
   alphanumeric = [A-Za-z0-9_]
   prefix = ( alpha alphanumeric* ) | '_'
   localname = alpha alphanumeric*
   qname = prefix ":" localname

/2000/10/swap/notation3.py
   _namechars = [a-z] + [A-Z] + [0-9] + '_-'
   qname() => _namechars* ':' _namechars* # v 1.87+ 2001/08/23

/2000/10/swap/rdfn3.g
   PREFIX: r'[a-zA-Z0-9_-]*:'
   QNAME: r'([a-zA-Z][a-zA-Z0-9_-]*)?:[a-zA-Z0-9_-]+'
   EXVAR: r'_:[a-zA-Z0-9_-]+'

/2000/10/n3/notation3.py
   _namechars = [a-z] + [A-Z] + [0-9] + '_-'
   qname() => ( _namechars* ':' _namechars* ) | _namechars*

/2000/10/swap/n3spark.py
   qname: r' [a-zA-Z0-9_-]*:[a-zA-Z0-9_-]* '

/2001/03/flaten3/lexer.l
   wordchar ([_A-Za-z$!]|[0-9])
   {wordchar}*":"{wordchar}+

/cvsweb/~checkout~/2001/blindfold/sample/n3.bnf
   alpha ::= [a-zA-Z];
   alphanumeric ::= alpha | [0-9] | "_";
   nprefix ::= "" | ((alpha | "_") alphanumeric*);
   localname ::= alpha alphanumeric*;
   qname ::= nprefix ":" localname;

RDF::Notation3/Notation3.pm
   $tk =~ /^([_a-zA-Z]\w*)?:$/o)
   $tk =~ /^([_a-zA-Z]\w*)?:[a-zA-Z]\w*$/o

eep/n3.py
   Name = r'[A-Za-z0-9_]+'
   bNode = r'_:' + Name
   QName = r'[A-Za-z0-9]*:' + Name
   Prefix = r'[A-Za-z0-9]*:'

To summarize the various productions in a canonical format:-

/DesignIssues/Notation3
   prefix = [A-Za-z][A-Za-z0-9_]* | '_'
   name = [A-Za-z][A-Za-z0-9_]*

/2000/10/swap/notation3.py
   prefix = [A-Za-z0-9_-]*
   name = [A-Za-z0-9_-]*

/2000/10/swap/rdfn3.g
   prefix = [A-Za-z0-9_-]* | ([A-Za-z][A-Za-z0-9_-]*)? # ???
   name = [A-Za-z0-9_-]+

/2000/10/swap/n3spark.py
   prefix = [A-Za-z0-9_-]*
   name = [A-Za-z0-9_-]*

/2001/03/flaten3/lexer.l
   prefix = [A-Za-z0-9_$!]*
   name = [A-Za-z0-9_$!]+

/cvsweb/~checkout~/2001/blindfold/sample/n3.bnf
   prefix = '' | [A-Za-z_][A-Za-z0-9_]*
   name = [A-Za-z][A-Za-z0-9_]*

RDF::Notation3/Notation3.pm
   prefix = ([A-Za-z_]\w*)?
   name = [A-Za-z]\w*

eep/n3.py
   prefix = [A-Za-z0-9]* | '_'
   name = [A-Za-z0-9_]+

For comparison:-

Prefixes
   DesignIssues = [A-Za-z][A-Za-z0-9_]* | '_'
   notation3.py = [A-Za-z0-9_-]*
   rdfn3.g      = [A-Za-z0-9_-]* | ([A-Za-z][A-Za-z0-9_-]*)? # ???
   n3spark.py   = [A-Za-z0-9_-]*
   lexer.l      = [A-Za-z0-9_$!]*
   n3.bnf       = [A-Za-z_][A-Za-z0-9_]* | ''
   Notation3.pm = ([A-Za-z_]\w*)?
   Eep n3.py    = [A-Za-z0-9]* | '_'

Names
   DesignIssues = [A-Za-z][A-Za-z0-9_]*
   notation3.py = [A-Za-z0-9_-]*
   rdfn3.g      = [A-Za-z0-9_-]+
   n3spark.py   = [A-Za-z0-9_-]*
   lexer.l      = [A-Za-z0-9_$!]+
   n3.bnf       = [A-Za-z][A-Za-z0-9_]*
   Notation3.pm = [A-Za-z]\w*
   Eep n3.py    = [A-Za-z0-9_]+

It is interesting that only notation3.py and n3spark.py agree on the QName production. As already mentioned, the DesignIssues BNF is slightly ambigous in that it lists "alpha", "alphanum*" and "_" as the prefixes, which doesn't make sense, and even disallows "void". The rdfn3.g grammar is broken since it allows (for example) "_0" to be declared as a prefix, but not used as a QName. The Eep n3.py version was my initial interpretation of the production, and I will change it to that which I recommend below.

In general, it is better to "be conservative in what you write, and liberal in what you accept" (to paraphrase Tim), so Notation3 parsers should probably use the most liberal of the productions above, and Notation3 writers (including humans) the most conservative. However, it would be nice if everyone could agree on a production.

One implementation question is whether or not "_" as the bNode prefix should be overridable. CWM allows one to do this, but I think that this is confusing for people who are trying to learn Notation3, and not all that difficult to ban in a parser. The recommendation below is based upon all of the productions above.

Recommendation
   prefix = ([A-Za-z][A-Za-z0-9_]*)? | '_'
   name = [A-Za-z0-9_]+

Notes on the recommendation: The hyphen-minus "-" character is generally disallowed since the DesignIssues note excludes it for its grammar, reserving the character for future use. I have allowed "_" as the first character of a name since I have seen this used in various Notation3 files already - notwithstanding the fact that Notation3, n3.bnf, and Notation3.pm disallow it.

Todo: now repeat for all of the Notation3 productions!

Sean B. Palmer