Internet Draft Sean B. Palmer Document: draft-palmer-resource-uri.txt June 2002 Expires: December 1, 2002 The "resource" Uniform Resource Identifier Scheme Status Of This Memo This document is an Internet-Draft and is in full conformance with all provisions of Section 10 of RFC2026. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF), its areas, and its working groups. Note that other groups may also distribute working documents as Internet- Drafts. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress." The list of current Internet-Drafts can be accessed at http://www.ietf.org/ietf/1id-abstracts.txt The list of Internet-Draft Shadow Directories can be accessed at http://www.ietf.org/shadow.html. This Internet-draft will expire on September 1, 2002. Copyright Notice Copyright (C) The Internet Society (2001). All Rights Reserved. Abstract A "resource" URI allows one to identify a resource using properties of the resource alone. The set of properties and their objects are encoded into the URI itself so that "resource" URIs are protocol independent; an abbreviation mechanism is provided to keep the amount of unecessary data in the URI down. 1. Introduction "resource" URIs can be thought of as identifying an existentially quantified node with the property and object pairs being predicates of that node; one may say that a "resource" URI identifies "something which has these properties". This URI scheme draws its roots in the work on Uniform Resource Characteristics [URC], and in recent work on the Semantic Web [SW]. To provide for extensibility, and to enable the URIs to be deployed on a global scale, URI-views (absoluteURIs with an optional fragment; defined in section 2.1.3) are used to denote properties. The utility of self-contained and self-defining URIs is that anyone can create a new property, reducing the need for new URI schemes. The object values may be URI-views, or unicode string literals. 2. The "resource" URI Scheme 2.1 Syntactic Structure Using ABNF as defined in [RFC 2234]. resourceURI = "resource:" *nsbinding popair *( ";" popair ) nsbinding = "@" name "=" escapedURIv ";" popair = property "=" object property = ( ( "$" escapedURIv ) / QName ) object = ( ( "$" escapedURIv ) / literal ) escapedURIv = QName = name ":" name literal = *( ( alphanum / "_" / "." / "-" ) / escaped ) ; alphanum and escaped from [RFC 2396] ; name = 1*alpha ; alpha from [RFC 2396] The , , and productions are imported from [RFC 2396]. is defined in section 2.1.1 of this document, and is defined in section 2.1.2 ibid. Examples of syntactically valid "resource" URIs include:- 2.1.1 The escapedURIv Production An escapedURIv is an encoded URI-view (defined in section 2.1.3) which consists only of valid URI chracters minus ";", "=", and "#". escapedURIv instances have a one-to-one mapping with URI-views, and one can convert between the two forms via. a simple algorithm. Algorithm for creating an escapedURIv from a URI-view: * iterate through the URI-view three times: * on the first iteration, replace all '%' with '%25' * on the second iteration, replace all '#' with '%23' * on the third iteration, replace all '=' with '%3D' * on the fourth iteration, replace all ';' with '%3B' Python code for this algorithm is available from Appendix A, part i. Algorithm for creating a URI-view from an escapedURIv: * iterate through the escapedURIv four times: * on the first iteration, replace all '%3B' with ';' * on the second iteration, replace all '=' with '%3D' * on the third iteration, replace all '%23' with '#' * on the fourth iteration, replace all '%25' with '%' Python code for this algorithm is available from Appendix A, part ii. 2.1.2 The literal Production The literal production specifies a string of encoded unicode [Unicode] characters, fit for use in a URI. Algorithm for creating a literal from a unicode string: * encode the unicode string as UTF-8 via. the process set out in [RFC 2279] * for character in the UTF-8 encoded unicode string: * if character not in [A-Za-z0-9_.-]: * replace character with its %HH URI encoded counterpart, as specified in [RFC 2396] part 2.4.1 Python code for this algorithm is available from Appendix A, part iii. Algorithm for creating a unicode string from a literal: * replace each %HH URI escaped octect in the literal with the octet that it encodes * UTF-8 un-encode the resulting string via. the process set out in [RFC 2279] Python code for this algorithm is available from Appendix A, part iv. 2.1.3 Definition of "URI-view" A URI-view is defined as:- URI-view = absoluteURI *1( "#" fragment ) ; absoluteURI and fragment from [RFC 2396] i.e. an absoluteURI plus an optional fragment. This differs from the URI-reference production of [RFC 2396] in that relative URI bases are not allowed. 2.1.4 Additional Constraints Any QNames used as instances of the property section must have had their prefixes already declared in an appropriate nsbinding instance. For further details, refer to section 2.3: "QName Processing". 2.2 Equality of "resource" URIs The scheme component of the URI is case insensitive (following the advice in section 3.1 of [RFC 2396]). The QName production is also case insensitive. Therefore, the following URIs are equivalent:- resource:@blargh=http://example.org/%23;blargh:gurk=cyrker RESOURCE:@Blargh=http://example.org/%23;BLARGH:gurk=cyrker The escapedURIv production defers to the specification of the URI that is being encoded. The literal production is case sensitive. 2.3 QName Processing "resource" URIs are made up of a series of property/value pairs, which themselves are made up of either URI-view and unicode string datatypes. In order to allow for some abbreviation, since some URI-view properties often share a common base/prefix (often referred to as a "namespace": not to be confused with a namespace in XML), we provide a special QName binding syntax to allow for some abbreviation. Instances of the nsbinding production specify a binding from a short alphabetic string (referred to here as the "prefix") to a URI-view (referred to here as the "namespace"). The prefix can then be used in lieu of the namespace when it could be used in a property; i.e. an instance of the QName production can be used as a syntactic shorthand for a corresponding URI-view. For example, in the following URI:- resource:@blargh=http://example.org/%23;blargh:gurk=cyrker The nsbinding component is comprised of the prefix "blargh" and the namespace (after unencoding) "http://example.org/#". In the popair, the QName "blargh:gurk" is used, comprising the prefix "blargh" and the local name "gurk". In order to come up with the resultant URI-view, one simply takes the local name, replaces that with the URI-view base that it was earlier bound to, and appends the local name. Thus, we end up with the URI-view:- http://example.org/#gurk Note that for "blargh" we could have used any other string. Note also that this means that the following two "resource" URIs are completely equivalent:- resource:@blargh=http://example.org/%23;blargh:gurk=cyrker resource:$http://example.org/%23gurk=cyrker In this case, the one with the namespace binding declared is longer, but when many QNames are utilized, it can result in a great reduction in URI length. 3. Notes for implementers The "resource" URI syntax was constructed with parsability, concision, and readability in mind. The nsbinding and popair productions can be split on the ";" character. Distinguishing between nsbinding and popair instances, and escapedURIv and QName/literal instances is a matter of testing for the first character being "@" or "$" respectively. Also, the "=" character should only occur once--as the separator--in both the nsbinding and popair productions. 4. Security Considerations This URI scheme does not introduce any known new security concerns. 5. IANA Considerations None. 6. Acknowledgements Many thanks to deltab for his review, and to Jill Lundquist (and her PGP key) for completely inadvertently providing the inspiration for this URI scheme. References [RFC 2234] D. Crocker, P. Overell (1997). Augmented BNF for Syntax Specifications: ABNF. [RFC 2279] F. Yergeau (1998). UTF-8, a transformation format of ISO 10646. [RFC 2396] T. Berners-Lee, R. Fielding, L. Masinter (1998). Uniform Resource Identifiers (URI): Generic Syntax. [SW] http://www.w3.org/2001/sw/ [Unicode] http://www.unicode.org/ [URC] http://www.hypernews.org/HyperNews/get/www/URCs.html Appendix A: Python code for quoting algorithms The following functions were tested using Python 2.2, available from http://www.python.org/2.2/ i) def quote(uri): """Function for creating an escapedURIv from a URI-view.""" uri = uri.replace('%', '%25') uri = uri.replace('#', '%23') uri = uri.replace('=', '%3D') uri = uri.replace(';', '%3B') return uri ii) def unquote(uri): """Function for creating a URI-view from an escapedURIv.""" uri = uri.replace('%3B', ';') uri = uri.replace('%3D', '=') uri = uri.replace('%23', '#') uri = uri.replace('%25', '%') return uri iii) def quoteLiteral(s): """Function for creating a literal from a unicode string.""" safe = 'ABCDEFGHIJKLMNOPQRSTUVWXYZ' \ 'abcdefghijklmnopqrstuvwxyz' \ '0123456789_.-' s = list(s.encode('utf-8')) for i in range(len(s)): if s[i] not in safe: s[i] = '%%%02X' % ord(s[i]) return ''.join(s) iv) def unquoteLiteral(s): """Function for creating a unicode string from a literal.""" s = s.split('%') res = [s[0]] del s[0] for item in s: res.append(chr(int(item[:2], 16)) + item[2:]) return unicode(''.join(res), 'utf-8') v) #!/usr/bin/python """Code for parsing a "resource" URI.""" import sys, re scheme = 'resource:' def unquote(uri): """Function for creating a URI-view from an escapedURIv.""" uri = uri.replace('%3B', ';') uri = uri.replace('%3D', '=') uri = uri.replace('%23', '#') uri = uri.replace('%25', '%') return uri def unquoteLiteral(s): """Function for creating a unicode string from a literal.""" s = s.split('%') res = [s[0]] del s[0] for item in s: res.append(chr(int(item[:2], 16)) + item[2:]) return unicode(''.join(res), 'utf-8') def process(uri): # Check for the correct scheme, and strip it if not uri.startswith(scheme): raise "Incorrect scheme" else: uri = uri[len(scheme):] parts = uri.split(';') bindings = {} popairs = [] result = [] for part in parts: if part.startswith('@'): pfx, ns = part[1:].split('=') bindings[pfx.lower()] = ns else: popairs.append(part) for po in popairs: property, object = po.split('=') if property.startswith('$'): property = '<%s>' % unquote(property[1:]) else: pfx, name = property.split(':') property = '<' + bindings[pfx.lower()] + name + '>' if object.startswith('$'): object = '<%s>' % unquote(object[1:]) else: object = '"%s"' % unquoteLiteral(object).encode('utf-8') result.append((property, object)) return result def serialize(popairs): result = '' for popair in popairs: result += '_:x %s %s .\n' % popair return result.strip() def test(): t = 'resource:@foaf=http://xmlns.com/foaf/0.1/;foaf:nick=sbp' print 'property/object pairs...' popairs = process(t) print popairs print 'NTriple serialization, with the node as _:x' ntriples = serialize(popairs) print ntriples if __name__=="__main__": test()