RDF

CWM - Closed World Machine

CWM is a popular Semantic Web program that can do the following tasks:-

CWM was written in Python from 2000-10 onwards by Tim Berners-Lee and Dan Connolly of the W3C.

This resource is provided so that people can use CWM, find out what it does (documentation used to be sparse), and perhaps even contribute to its development.

Installing CWM

To install CWM, you will first need to install Python if you don't have it on your machine. The latest version is highly recommended: certainly upgrade if you are using a 1.x.x version, since CWM seems to depend upon SAX (the simple XML API). CWM worked properly with Python 2.2 as of 2002-08.

Next, you will need to get the CWM modules. All of the CWM material is developed and hosted on the W3C site, in the SWAP (Semantic Web Area for Play/Semantic Web Application Platform) directory. However, since not all of the files are publically accessible (many of the files return "403 Forbidden"), Dan Connolly opened up the directory through CVS: /2000/10/swap/ in W3C CVS. The CVS mirror is approximately 50 minutes behind the canonical SWAP directory.

Important note: the latest version of CWM (files of 2002-02-25) is quite slow (check out the bugs reports: 1, and 2). TimBL is gradually speeding it up, but note that the 1.82 distribution of CWM is relatively stable, and has the same kind of functionality. My personal recommendation is that you try the latest version first, but if that doesn't work, go for the .tar.gz of v1.82.

The Modules

You should get the following modules and put them into a single directory:-

Reminder: the tar.gz file for CWM 1.82 is available as cwm1.82.tar.gz (Winzip should be able to read tar.gz, but if not, please let me know.

Installation Problems

CWM relies upon the SAX XML parser in Python to parse XML RDF, but this usually does not come with the Python distribution, and has to be installed separately. On Debian you can run "apt-get install python-xml" (tip courtesy of DanB and DanC). The following details noted in sax2rdf.py may also help:-

If you run Python on CygWin, you can just grab the pre-compiled package from dbs. However, when you upgrade Python it seems that you have to reinstall it, which is very frustrating. If in doubt, try to use the standard Windows Python installation from CygWin (you can run Windows programs through CygWin).

Getting Started

There are a number of resources available to get you started on CWM and Notation3. The main ones come from TimBL himself: the Notation3 Primer a page of CWM Examples, and the CWM Homepage. The CWM examples page is very good if you want to get started, but only describes a portion of what CWM can do, and leaves you to figure out the rest.

CWM is run at the command line as normal, and takes a number of different flags. The most common of these are listed on the SWAP page under Command line parameters (now up to date).

One important command line flag left off of the list is --strings. This prints out strings such that: "--strings Dump :s to stdout ordered by :k whereever [sic] { :k log:outputString :s }" (from cwm.py).

As a matter of interest, there is a log file of every time the string "CWM" was mentioned on the RDF IG channel.

CWM On Your Desktop

I created a batch file in Windows for CWM, and I found that I most commonly used only a certain set of commands. You can of course customize your own commands, and perhaps make an sh file, but these are the commands that I find I most commonly use:-

I wrote a short CWM Utility as a Windows BAT or a Python script that can run some of the most popular commands; it allows you to have CWM running on your desktop, or from a very simple menu based interface.

CWM NTriples Cleaner

Because CWM does not produce valid NTriples (in version 1.82 and before, at least), I wrote a Python script that will clean it up: cwmntclean.py.

CWM Example Code

CWM can do many things: merge documents; apply rules (by adding or filtering); convert between XML RDF, NTriples, and Notation3; flatten contexts; perform queries (just a subset of rules); do math, string functions, and get os variables; and so on... It's difficult to know where to start getting into it.

The SWAP test directory contains a number of experiments run by Tim and Dan to test CWM's functions. Here is a summary of the pieces...

Regression Test

There is a large regression test which forms the basis of the things that get checked regularly in CWM.

Semantic Web Utilities

One of the things that you notice when you have used CWM for a while is how close in philosophy it is to the set of *nix tools*. You tend to start setting up filters (N3 rules files etc.) that can do certain tasks. Here are just some of the files that have already been developed.

RDF Lint and check.n3 are basically things that assist with validation, but note that there is not really such a thing as "invalid" data on the Semantic Web: there is only farily consistent, and inconsistent data. The utilities mentioned go through the data and flag the inconsistencies.

Quote from the SWAP homepage, exposing a central design philosophy:-

Cwm will run as a unix command, and is designed to be usable as a simple data manipulator for RDF on the lines of sed, awk, etc or xsl. - SWAP

Tests

Note that according to DanC, "Closed World Machine" is a misnomer because it can get documents from outside of what it is fed, by way of HTTP GET. It has two built-ins for getting stuff from files, log:semantics (gets the file and parses it as a set of N3 formulae) and log:content (gets the file and parses it as a string). Note also that in the strict sense of the phrase "Closed World", being able to gather files via. HTTP GET is not a test of closed-worldness (thanks to Bijan Parsia for pointing that out).

Notation3 Grammar: Tokenizers etc.

CWM and Notation3 are synonymous, as N3 is the serialization of RDF that CWM was built up around. However, Notation3 itself was just designed as a Wiki RDF format by TimBL and DanC, and as such was never formally specified by a Working Group, nor recommended by the W3C. This has left implementations rather inconsistent, notwithstanding DanC's efforts to standardize the langauge.

I did a long survey of local N3 grammars, culminating with The Great QName Survey. The list of N3 implementations as of 2002-01 are as follows:-

Miscellaneous

CWM Web Services

There are currently a couple of Web services for CWM, including one on this very page.

SWAG Service

SWAG have set up an N3 to RDF online service, which proves to be quite popular. It can also, in fact, convert from RDF to N3, and think about the stuff, so that you can enter rules etc.

W3C Service

The W3C maintain a small service, running Notation3.py as a CGI, referenced at the top of the N3 spec., and largely obsoleted by the SWAG service.

Paste 'n' Go

Here is a form for you to paste some N3, and convert into XML RDF (powered by some Aaron Swartz magic):-

Think about it:

CWM Built-Ins

There are a number of modules being written for CWM that let CWM do "special" things when it finds a rule with a certain predicate in it. For example, if a rule contains "<somefile.n3> log:content ?y", then CWM will actually open up "somefile.n3" and return its content as a string literal for ?y (N.B. adding the ? before a name is a shorthand for universally quantified variables).

CWM_Log (built into Llyn)

The "log:" namespace is very important:-

http://www.w3.org/2000/10/swap/log#

It contains the log:implies, log:forSome, and log:forAll pseudo-properties that are used for First Order Predicate Logic. However, there are a number of other terms in the namespace that do a certain amount of stuff (from $Id: llyn.py,v 1.4 2001/11/19 15:26:14 timbl Exp $):-

Note on using some of these together:-

{ ( [ is log:semantics of <../daml-ex.n3> ] 
    [ is log:semantics of <../invalid-ex.n3> ]
    [ is log:semantics of <../schema-rules.n3> ] )
        log:conjunction [ log:conclusion :G]}
                                 log:implies { :result :is :G }. 
The above is a much more complicated way of writing the cwm command line "cwm daml-ex.n3 invalid-ex.n3 schema-rules.n3 --think".
- http://dev.w3.org/cvsweb/2000/10/swap/test/includes/conjunction.n3

CWM_String

http://www.w3.org/2000/10/swap/string#

This module contains built-ins that let you process strings. The following properties are defined:-

Try the string schema, and the cwm_string.py module for more information.

CWM_OS

http://www.w3.org/2000/10/swap/os#

Try the os schema, and the cwm_os.py module; it contains a property that will make CWM get the appropriate OS environment variable.

CWM_Crypto

http://www.w3.org/2000/10/swap/crypto#

The module is available as cwm_crypto.py. cf. Cryptography In CWM: Hashes. You'll need to add a couple of obvious lines to llyn.py in order to register the built-ins.

The properties that one can use at the moment are just hash functions, that is, CWM will return the hash of the string:-

   crypto:md5 a rdf:Property; rdfs:label "md5"; 
      rdfs:comment "The MD5 hash of a string"; 
      rdfs:domain string:String; rdfs:range string:String .

   crypto:sha a daml:UnambiguousProperty, 
      daml:UniqueProperty; rdfs:label "sha"; 
      rdfs:comment "The SHA hash of a string"; 
      rdfs:domain string:String; rdfs:range string:String .

Note how SHA is assigned a higher trust level than its MD5 counterpart.

CWM_Math

CWM: Mathematical Built-Ins. "CWM can now do addition, multiplication, subtraction, division, remainders, negation, exponentiation, count the members in a DAML list, and do the normal truth checking functions, only sub classed for numeric values."

For example:-

{ :x math:sumOf ([ math:quotientOf ("7" "2") ]
   [ math:exponentiationOf ([ math:remainderOf  ("7" "2")] "10000000") ]
   [ is math:memberCount of ("a" "b" "c" "d" "e") ]) } log:implies
{ :x :valueOf "(7 / 2) + ((7 % 2)^10000000) + 5 [should be 9.5]" } .

gives the correct output:-

"9.5" :valueOf "(7 / 2) + ((7 % 2)^10000000) + 5 [should be 9.5]" .

CWM_URI

In development by Mark Nottingham.

20:51:57 <mnot> I'm working on a cwm_uri module, but I need to be 
able to instantiate complex, anonymous objects based on the subject, 
so it's slow going
- http://ilrt.org/discovery/chatlogs/rdfig/2001-12-01.txt

CWM_XPath

Also in development by mnot, but this time with running code and tests! Try: CWM built-in for XPath. It requires PyXML to make it run (as does CWM in general for XML RDF processing).

Creating your own builtins

In fact, the builtin systax is rather simple - anyone with a fundamental knowledge of Python should be able to create a new builtin module just by going through the current builtins modules.

"Real" Projects

Deploying CWM on a large scale was never really on the cards, although lately it appears to be outgrowing its "play/demonstration code" status. CWM has managed to get implemented in a few projects.

NTriples

NTriples is a special fixed subset of Notation3; without formulae, multiple object or po combinations, multiline literals, blank bNodes, or QNames; just good ol' triples, one per line. The NTriples specification is available from the W3C, edited by Dave Beckett and Art Barstow. NTriples is more expressive that XML RDF, easy to parse, and is an excellent lowest common denominator for serializations.

Schemata That Use N3

TimBL's "log", "string", "os", and "crypto" schemata, the PIM doc and contact schemata, and the EARL 0.95 schema.

CWM Clones

Euler

The Euler proof mechanism is a proof engine written in Java by Jos De Roo, that uses Euler paths to infer without fear of endless loops. In can parse Notation3, including N3 rules.

CWMClone in Prolog

CWMClone is an implemenation of CWM in Prolog, under development by Bijan Parsia. The CWMClone page itself contains some useful intructions on not only running the program, but also for rolling your own CWM.

CWMClone is a development project at this stage, but does work rather solidly.

Eep

I also wrote a CWM clone (that wasn't initially meant to be a CWM clone): Eep RDF API, Inference Engine, and NTriples/N3 Parser

Roll Your Own CWM

There are plenty of things to consider when rolling your own version of CWM, besides the obvious "how many features should I implement?". Basically, CWM is comprised of the following parts:-

Each of thwse parts can be treated as essentially separate units.

@@ TODO: more stuff in this section.

Colophon: Quotes

I hope to reach cwm-enlightenment eventually, but I'm not holding my breath. - DanC, #rdfig 2001-05-07 00:18