[SBP] [SW Tips]

New URI Schemes: 99% Harmful

This is a guide to anyone who is thinking about creating a new URI scheme, containing information on why it's usually a bad idea, why that is, alternatives to registering URI schemes, and some hints and tips for when creating a new URI scheme really is necessary.

What Is A URI Scheme?

URIs are the Web identifiers that start with a small string and a colon. Most people are familiar with HTTP URIs such as "http://www.w3.org/Addressing/", mail URIs such as "mailto:sean@mysterylights.com", and other URIs such as "data:,someData". Anything that has a URI is on the World Wide Web, because the very definition of being "on the Web" is in having a URI. A URI scheme is the string that occurs before the first colon, which forms a base for the URI. In that case of the examples above, the schemes are "http", "mailto", and "data".

URI schemes are designed so that processors of URIs can gague what to use the URIs for in a certain context. For example, when you put the URI "http://www.w3.org/" into your Web browser, it may open a connection to the w3.org server on port 80, request the document "/" using GET, and recieve a response which it usually renders, i.e. a Web page. Caveat: this does not mean that a URI scheme is simply a message to tell processors to do a certain action: the processor has to work that out for themselves. In other words, a URI is a noun, not a verb.

Why Create A New URI Scheme?

For creating a new URI scheme, the following requirements must be fulfilled:-

There is some resource which needs to be represented on the Web, but:-
- There is no current URI scheme that can identify the resource to a degree of satisfaction, and/or the objects are already in a well known, well used universal space
- A URN namespace is not sufficient for identifying the resources

Let's look at some examples of when a new URI scheme has been needed.

Useful "New" URI Schemes

data:

Until "data:" came along, there was no particular mechanism for identifying raw data (with an optional MIME type) on the World Wide Web. For example, if you wanted to identify the plain text string "someString", you couldn't do so as a first class object. This also meant that URIs themselves could not be objects on the Web (the notion of an identifier being a resource).

This scheme met the requirement above that "There is no current URI scheme that can identify the resource to a degree of satisfaction".

tel:

The "tel:" URI scheme was devised for identifying international telephone numbers on the World Wide Web. For example, this makes it possible to link to a telephone number on a Web page.

Tim Berners-Lee used this URI scheme (he actually called it "phone", but "tel" was eventually adopted instead) as an example in one of his Design Issues documents as an example of how URIs are truly universal:-

Let us take, for example, the exercise of mapping an international telephone number onto the URL. International telephone numbers are hierarchical. For example, the meaning of and the format of a telephone number depends on the country, but there is a universal format for a telephone number in the world which can be understood everywhere. [...] Mapping this onto the URL syntax, the double slash would be used to indicate that one is starting from the top of the tree, so the number
+1 (617) 253-5708
- Universal Resource Identifiers -- Axioms of Web Architecture

freenet:

The freenet system was designed as a kind of anonymous decentralized alternative to HTTP. However, because the Web is all encompassing, it was quite easy to come up with a new set of identifiers to point to the new freenet space. The scheme that was chosen is, quite obviously, the "freenet:" URI scheme. Unfortunately, this scheme is unregistered (which is a big no-no), but the concept of the scheme is most valid. Register that scheme!

Why Not Create A New URI Scheme?

There is a huge problem in people creating new URI schemes, in that many of them do not understand what URIs even are, and do not seek to register them.

URI schemes are one of the most valuable resources that the Web has to offer, and are intended only to address information spaces that are globally useful. Creating new URI schemes to address spaces which are not useful to the Web in general, which aren't registered, or which break some axioms of Web architecture are the most harmful sorts of new URI schemes.

Unregistered schemes should not be deployed widely and should not be used except experimentally.

- An Index of WWW Addressing Schemes

It is often said that new URI schemes in and of themselves are not actually harmful, it is only unregistered proprietary schemes that are harmful. Of course, it is still technically possible for registered URI schemes to violate URI axioms, but new URI schemes and URN namespaces have to go through a process of peer review, and are therefore less likely to do so.

Proprietary Offline URI Schemes

So, what about URI schemes that have been created for use by companies offline; surely they don't need to be registered. Well, it depends upon the application. Is the application really a proprietary-offline-only sort of thing? In many cases, it can be just as easy using one of the alternatives to new URI schemes, and yet much more beneficial, because they are proper registered URIs. One very important apllication area in which new URI schemes should not be created, even for offline use, is in RDF. RDF is meant to be a non-proprietary format that people can easily use off or on the Web: there should be no distinction between the two. For example, if you start using "blargh:" for URIs offline, and then you decide that you want to publish the data online, you will have to convert those "blargh:" URIs (which aren't really URIs) into proper URIs that can be used on a global scale. So why not just use proper URIs in the first place? The proper URI alternatives to new URI scheme are actually much easier to create, register, and maintain that many people might think.

Alternatives To Registering New URI Schemes

I should note at this point that I agree that registering a new URI scheme is quite a lengthy process, and that this is obtrusive to good URI schemes, but note that URIs are there to be a global universal set of identifiers, and hence when creating new ones, the Web community and IETF in particular need to be very sure that it is going to be of benefit to the world. If companies just start taking URI schemes on a whim and not even register them, it will lead to chaos, as duplicate schemes get taken. URIs using unregistered URI schemes are not proper URIs.

Informal URNs

So, the alternatives. The first idea that springs to mind are informal URNs. More about URNs can be found in RFC 2611, see especially Section 4 II, which outlines the process for informal URN registration. It takes about 3 weeks following a very simple process to register an informal URN, and often one recieves great feedback on the scheme which may enhance it, and your understanding of them (all informal URNs are reviewd by a technical list). So instead of "blargh:", you'd end up with something like "urn:urn-1:", i.e. only 3 characters more!

TAG URIs

Another, and possibly better, alternative is the TAG URI scheme. This is a simple way to create persistent and transitory identifiers very quickly and cheaply. All you need is a domain name, or an email address, and combined with a date, one has a truly persistent identifier base under which one can create URIs. For example, of you have the domain name xyz.org, and you can prove that you owned it on the 1st January 2001, then instead of "blargh:", you could use: "tag:xyz.org,2001:", which is 10 characters more, but note that you don't have to register it at all; you just have to follow the syntax requirements of the TAG URI specification (which is currently in the process of being registered).

"esl" URIs

A third possible alternative is a URI scheme that I am currently working on, the "esl" scheme, which allows anyone to create identifiers using a label and a digital signature. Note that this scheme is still very much experimental, has not been registered, and the identifiers that result are often quite lengthy. However, you get the added bonus that your identifiers are very secure, thanks to the digital signatures.

URIs And The Semantic Web

People will continue to create URI schemes, because there is an inherent lack of understanding about them and how URIs relate to the Semantic Web in the development community. We are pressing for more education and outreach publicatiopns when it comes to URIs, but it's slow work. At any rate, the concept of the Semantic Web (which is the main purpose of RDF) is that the whole system is decentralised. When people create their own proprietary URI schemes, it is a massive threat to that decentralization and openness, and as such, people should be recommended against doing it. Thankfully, there are the simple alternatives that I have listed above, and I think that people should use them instead.

Axioms Of URIs

So, you've read the above, you understand what URI schemes are, and you still think that your new URI scheme is justified. Now you're wondering what these set of axioms are that pertain to URIs and the creation of new URI schemes. Note that many of these axiosm were excerpted or modified from TimBL's Universal Resource Identifiers -- Axioms of Web Architecture:-

A URI is only ever defined by its implementations (context).: Implementations of URIs introduce a scope to the use of URIs. URIs are always contextualized.
URIs should be persistent within their given context.: URIs should consistently refer to the same thing; on a qualitative rather than quantitative scale.
The use of a particular URI should be well defined and unique.: A URI can only ever identify one resource at any one time, and this mapping should be set out normatively somewhere.
Anything that can be identifed can have a URI.: Anything which is important should be identified with a URI.
URIs are opaque: A URI should not contain information in it which may lead to abuse; file name extensions should not be used to predict the type of a file.

These axioms are important to the stability of the World Wide Web itself, as it is often said that URIs are the most fundamental invention of the WWW.