CWM - Closed World Machine

CWM is a popular Semantic Web program that can do the following tasks:-

Parse and pretty-print the following RDF formats: XML RDF, Notation3, and NTriples
Store triples in a queryable triples database
Perform inferences as a forward chaining FOPL inference engine
Perform builtin functions such as comparing strings, retrieving resources, all using an extensible builtins suite

CWM was written in Python from 2000-10 onwards by Tim Berners-Lee and Dan Connolly of the W3C.

This resource is provided so that people can use CWM, find out what it does (documentation used to be sparse), and perhaps even contribute to its development.

Installing CWM

To install CWM, you will first need to install Python if you don't have it on your machine. The latest version is highly recommended: certainly upgrade if you are using a 1.x.x version, since CWM seems to depend upon SAX (the simple XML API). CWM worked properly with Python 2.2 as of 2002-08.

Next, you will need to get the CWM modules. All of the CWM material is developed and hosted on the W3C site, in the SWAP (Semantic Web Area for Play/Semantic Web Application Platform) directory. However, since not all of the files are publically accessible (many of the files return "403 Forbidden"), Dan Connolly opened up the directory through CVS: /2000/10/swap/ in W3C CVS. The CVS mirror is approximately 50 minutes behind the canonical SWAP directory.

Important note: the latest version of CWM (files of 2002-02-25) is quite slow (check out the bugs reports: 1, and 2). TimBL is gradually speeding it up, but note that the 1.82 distribution of CWM is relatively stable, and has the same kind of functionality. My personal recommendation is that you try the latest version first, but if that doesn't work, go for the .tar.gz of v1.82.

The Modules

You should get the following modules and put them into a single directory:-

converter-cgi.py - A CGI interface (non-essential)
cwm.py - CWM itself, as an interface to all of the modules
cwm_crypto.py - CWM Cryptographic builtins
cwm_math.py - CWM Mathematical builtins (cf. the math module in Python)
cwm_os.py - CWM OS builtins (cf. the os module in Python)
cwm_string.py - CWM String builtins (cf. the string module in Python)
llyn.py - This is the store & inference engine part, where most of the magic takes place
notation3.py - The Notation3 parser and serializer
RDFSink.py - An RDF Sink
sax2rdf.py - A SAX RDF Handler
thing.py - Interns the URIs and Strings for use elsewhere

Reminder: the tar.gz file for CWM 1.82 is available as cwm1.82.tar.gz (Winzip should be able to read tar.gz, but if not, please let me know.

Installation Problems

CWM relies upon the SAX XML parser in Python to parse XML RDF, but this usually does not come with the Python distribution, and has to be installed separately. On Debian you can run "apt-get install python-xml" (tip courtesy of DanB and DanC). The following details noted in sax2rdf.py may also help:-

If you run Python on CygWin, you can just grab the pre-compiled package from dbs. However, when you upgrade Python it seems that you have to reinstall it, which is very frustrating. If in doubt, try to use the standard Windows Python installation from CygWin (you can run Windows programs through CygWin).

Getting Started

There are a number of resources available to get you started on CWM and Notation3. The main ones come from TimBL himself: the Notation3 Primer a page of CWM Examples, and the CWM Homepage. The CWM examples page is very good if you want to get started, but only describes a portion of what CWM can do, and leaves you to figure out the rest.

CWM is run at the command line as normal, and takes a number of different flags. The most common of these are listed on the SWAP page under Command line parameters (now up to date).

One important command line flag left off of the list is --strings. This prints out strings such that: "--strings Dump :s to stdout ordered by :k whereever [sic] { :k log:outputString :s }" (from cwm.py).

As a matter of interest, there is a log file of every time the string "CWM" was mentioned on the RDF IG channel.

CWM On Your Desktop

I created a batch file in Windows for CWM, and I found that I most commonly used only a certain set of commands. You can of course customize your own commands, and perhaps make an sh file, but these are the commands that I find I most commonly use:-

Think: python cwm.py %1 --think > %1.think
Think and Purge: python cwm.py %1 --think --purge > %1.purge
N3 to XML RDF: python cwm.py %1 -rdf > %1.rdf
XML RDF to N3: python cwm.py -rdf %1 -n3 > %1.n3
N3 to NTriples*: python cwm.py %1 -ntriples -bySubject > %1.temp || python cwmntclean.py %1.temp > %1.nt

I wrote a short CWM Utility as a Windows BAT or a Python script that can run some of the most popular commands; it allows you to have CWM running on your desktop, or from a very simple menu based interface.

CWM NTriples Cleaner

Because CWM does not produce valid NTriples (in version 1.82 and before, at least), I wrote a Python script that will clean it up: cwmntclean.py.

CWM Example Code

CWM can do many things: merge documents; apply rules (by adding or filtering); convert between XML RDF, NTriples, and Notation3; flatten contexts; perform queries (just a subset of rules); do math, string functions, and get os variables; and so on... It's difficult to know where to start getting into it.

The SWAP test directory contains a number of experiments run by Tim and Dan to test CWM's functions. Here is a summary of the pieces...

Regression Test

There is a large regression test which forms the basis of the things that get checked regularly in CWM.

Semantic Web Utilities

One of the things that you notice when you have used CWM for a while is how close in philosophy it is to the set of *nix tools*. You tend to start setting up filters (N3 rules files etc.) that can do certain tasks. Here are just some of the files that have already been developed.

rules.n3 A file containing the RDF and DAML schema, plus a set of rules. This rules files is just a part of the RDF Lint stuff; it also contains rules for filtering out inconsistencies
check.n3 this tests "whether a schema mentions the predicates used in this data", and forms a good Schema "validator"
rdfstoxhtml.n3 - an RDF Schema to XHTML conversion utility. TARGET=myfile.n3; cwm rdfstoxhtml.n3 --think --strings > out.html Has a few bugs, but nicely demonstrates the "strings" command line flag
schema-rules.n3 - another Schema validator in N3
forgetDups.n3 is a filter that gets rid of duplicates, using the resource with the shortest URI.
axioms.n3 is a list processing set of axioms; quote: "List generators are things which work as lists and also if you use them as a predicate they construct a list one longer." I couldn't get it to work in CWM 1.82
sameThing.n3 this is basically a subset of some of the RDF Lint rules, to find all equivalences
usPlace2LatLong.n3 - this isn't exactly a generic tool, but it is a very good demonstration of the CWM builtins; especiailly string:scrape

RDF Lint and check.n3 are basically things that assist with validation, but note that there is not really such a thing as "invalid" data on the Semantic Web: there is only farily consistent, and inconsistent data. The utilities mentioned go through the data and flag the inconsistencies.

Quote from the SWAP homepage, exposing a central design philosophy:-

Cwm will run as a unix command, and is designed to be usable as a simple data manipulator for RDF on the lines of sed, awk, etc or xsl. - SWAP

Tests

contexts.n3 A very early contexts test
resolves-rdf.n3 an early resolution test
lists-simple.n3 complex list tests. In reality, you won't need to implement things such as empty lists, but it's good to check
quantifiers.n3 tests for quantification: a lot of tests!
t10.n3 t10.n3 is quite a seminal test file, checking out the log:semantics function and demonstrating that CWM does not do backwards chaining
strquot.n3 some string quoting tests
map/data.n3 and map/q.n3 are two files involved in my little coordinates processing experiment. They're useful for testing because the map data is relatively large. Filtering using: "cwm data.n3 --filter=q.n3 > out.n3" makes for a good benchmark test, cf. Re: new cwm release

Note that according to DanC, "Closed World Machine" is a misnomer because it can get documents from outside of what it is fed, by way of HTTP GET. It has two built-ins for getting stuff from files, log:semantics (gets the file and parses it as a set of N3 formulae) and log:content (gets the file and parses it as a string). Note also that in the strict sense of the phrase "Closed World", being able to gather files via. HTTP GET is not a test of closed-worldness (thanks to Bijan Parsia for pointing that out).

Notation3 Grammar: Tokenizers etc.

CWM and Notation3 are synonymous, as N3 is the serialization of RDF that CWM was built up around. However, Notation3 itself was just designed as a Wiki RDF format by TimBL and DanC, and as such was never formally specified by a Working Group, nor recommended by the W3C. This has left implementations rather inconsistent, notwithstanding DanC's efforts to standardize the langauge.

I did a long survey of local N3 grammars, culminating with The Great QName Survey. The list of N3 implementations as of 2002-01 are as follows:-

DesignIssues: Notation 3
SWAP: notation3.py
SWAP: rdfn3.g- "a Yapps grammar for RDF Notation 3" cf. rdfn3_yapps.py, rdfn3-gram.html - "RDF Notation3 Grammar", in XHTML
N3: notation3.py - DanC's original implementation of Notation3.py
SWAP: n3spark.py - "a re-implementation of RDF/n3 syntax using the SPARK tools"
flaten3: lexer.l - by Sandro Hawke
Blindfold: n3.bnf
RDF::Notation3 Notation3.pm
Eep: n3.py

Miscellaneous

excerpt.n3 a method of doing excerpts in Notation3

nurdle.n3 Nurdle! This is an early proof experiment by TimBL.

23:04:31 <timbl> * timbl nurdles a p3pr:statement [ p:data [ p:ref :x ]] [...]
23:05:07 <sbp> nurdle? [...]
23:05:39 <DanC> danc:noodle = timbl:nurdle. it got garbled over the phone, I 
think. [...]
23:06:14 <timbl> local:nurdle =  english:cogitatesALittleAbout" [...]
23:06:49 <sbp> Thanks. I'd always wondered about nurdle.n3.py, and all I 
could find were links to Tiddlywinks...
23:06:59 <timbl> In what langauge are your personal langauges expressed?
- http://ilrt.org/discovery/chatlogs/rdfig/2001-12-05.txt

todot.n3 - converts the circles and nodes diagram things into a dot file using the --strings command line. TimBL announced it on #rdfig

CWM Web Services

There are currently a couple of Web services for CWM, including one on this very page.

SWAG Service

SWAG have set up an N3 to RDF online service, which proves to be quite popular. It can also, in fact, convert from RDF to N3, and think about the stuff, so that you can enter rules etc.

W3C Service

The W3C maintain a small service, running Notation3.py as a CGI, referenced at the top of the N3 spec., and largely obsoleted by the SWAG service.

Paste 'n' Go

Here is a form for you to paste some N3, and convert into XML RDF (powered by some Aaron Swartz magic):-

CWM Built-Ins

There are a number of modules being written for CWM that let CWM do "special" things when it finds a rule with a certain predicate in it. For example, if a rule contains "<somefile.n3> log:content ?y", then CWM will actually open up "somefile.n3" and return its content as a string literal for ?y (N.B. adding the ? before a name is a shorthand for universally quantified variables).

CWM_Log (built into Llyn)

The "log:" namespace is very important:-

http://www.w3.org/2000/10/swap/log#

It contains the log:implies, log:forSome, and log:forAll pseudo-properties that are used for First Order Predicate Logic. However, there are a number of other terms in the namespace that do a certain amount of stuff (from $Id: llyn.py,v 1.4 2001/11/19 15:26:14 timbl Exp $ ):-

log:implies
log:asserts
log:equalTo - BI_EqualTo(LightBuiltIn,Function, ReverseFunction)
log:notEqualTo - BI_notEqualTo(LightBuiltIn)
log:uri - BI_uri(LightBuiltIn, Function, ReverseFunction)
log:rawType - BI_rawType(LightBuiltIn, Function)
log:racine - BI_racine(LightBuiltIn, Function) [...] The resource whose URI is the same up to the "#"
log:includes - BI_includes(HeavyBuiltIn)
log:notIncludes - BI_notIncludes(HeavyBuiltIn)
log:semantics - BI_semantics(HeavyBuiltIn, Function)
log:semanticsOrError - BI_semanticsOrError(BI_semantics)
log:conclusion
log:content

Note on using some of these together:-

{ ( [ is log:semantics of <../daml-ex.n3> ] 
    [ is log:semantics of <../invalid-ex.n3> ]
    [ is log:semantics of <../schema-rules.n3> ] )
        log:conjunction [ log:conclusion :G]}
                                 log:implies { :result :is :G }. 
The above is a much more complicated way of writing the cwm command line "cwm daml-ex.n3 invalid-ex.n3 schema-rules.n3 --think".

- http://dev.w3.org/cvsweb/2000/10/swap/test/includes/conjunction.n3

CWM_String

http://www.w3.org/2000/10/swap/string#

This module contains built-ins that let you process strings. The following properties are defined:-

string:concat "is concatenation of"
string:concatenation "is the concatenation of the strings in"
string:greaterThan "is greater than"
string:notGreaterThan "is not greater than"
string:lessThan "is less than"
string:notLessThan "is not less than"
string:startsWith "starts with"
string:endsWith "ends with"

Try the string schema, and the cwm_string.py module for more information.

CWM_OS

http://www.w3.org/2000/10/swap/os#

Try the os schema, and the cwm_os.py module; it contains a property that will make CWM get the appropriate OS environment variable.

CWM_Crypto

http://www.w3.org/2000/10/swap/crypto#

The module is available as cwm_crypto.py. cf. Cryptography In CWM: Hashes. You'll need to add a couple of obvious lines to llyn.py in order to register the built-ins.

The properties that one can use at the moment are just hash functions, that is, CWM will return the hash of the string:-

   crypto:md5 a rdf:Property; rdfs:label "md5"; 
      rdfs:comment "The MD5 hash of a string"; 
      rdfs:domain string:String; rdfs:range string:String .

   crypto:sha a daml:UnambiguousProperty, 
      daml:UniqueProperty; rdfs:label "sha"; 
      rdfs:comment "The SHA hash of a string"; 
      rdfs:domain string:String; rdfs:range string:String .

Note how SHA is assigned a higher trust level than its MD5 counterpart.

CWM_Math

CWM: Mathematical Built-Ins. "CWM can now do addition, multiplication, subtraction, division, remainders, negation, exponentiation, count the members in a DAML list, and do the normal truth checking functions, only sub classed for numeric values."

For example:-

{ :x math:sumOf ([ math:quotientOf ("7" "2") ]
   [ math:exponentiationOf ([ math:remainderOf  ("7" "2")] "10000000") ]
   [ is math:memberCount of ("a" "b" "c" "d" "e") ]) } log:implies
{ :x :valueOf "(7 / 2) + ((7 % 2)^10000000) + 5 [should be 9.5]" } .

gives the correct output:-

"9.5" :valueOf "(7 / 2) + ((7 % 2)^10000000) + 5 [should be 9.5]" .

CWM_URI

In development by Mark Nottingham.

20:51:57 <mnot> I'm working on a cwm_uri module, but I need to be 
able to instantiate complex, anonymous objects based on the subject, 
so it's slow going
- http://ilrt.org/discovery/chatlogs/rdfig/2001-12-01.txt

CWM_XPath

Also in development by mnot, but this time with running code and tests! Try: CWM built-in for XPath. It requires PyXML to make it run (as does CWM in general for XML RDF processing).

Creating your own builtins

In fact, the builtin systax is rather simple - anyone with a fundamental knowledge of Python should be able to create a new builtin module just by going through the current builtins modules.

"Real" Projects

Deploying CWM on a large scale was never really on the cards, although lately it appears to be outgrowing its "play/demonstration code" status. CWM has managed to get implemented in a few projects.

DanC's Circles and arrows diagrams using stylesheet rules
W3C Roadmap diagrams
EARL: 0.9 to 0.95, graph merging with WCAG
The Simpsons in RDF (my personal favourite)
Chords/Tab in RDF is also quite interesting
SWWS: An RDF Calendar Trial and poolGame.n3

NTriples

NTriples is a special fixed subset of Notation3; without formulae, multiple object or po combinations, multiline literals, blank bNodes, or QNames; just good ol' triples, one per line. The NTriples specification is available from the W3C, edited by Dave Beckett and Art Barstow. NTriples is more expressive that XML RDF, easy to parse, and is an excellent lowest common denominator for serializations.

Schemata That Use N3

TimBL's "log", "string", "os", and "crypto" schemata, the PIM doc and contact schemata, and the EARL 0.95 schema.

CWM Clones

Euler

The Euler proof mechanism is a proof engine written in Java by Jos De Roo, that uses Euler paths to infer without fear of endless loops. In can parse Notation3, including N3 rules.

CWMClone in Prolog

CWMClone is an implemenation of CWM in Prolog, under development by Bijan Parsia. The CWMClone page itself contains some useful intructions on not only running the program, but also for rolling your own CWM.

CWMClone is a development project at this stage, but does work rather solidly.

Eep

I also wrote a CWM clone (that wasn't initially meant to be a CWM clone): Eep RDF API, Inference Engine, and NTriples/N3 Parser

Roll Your Own CWM

There are plenty of things to consider when rolling your own version of CWM, besides the obvious "how many features should I implement?". Basically, CWM is comprised of the following parts:-

A Notation3 parser (notation3.py)
An XML RDF parser (sax2rdf.py)
A general RDF API (thing.py)
An RDF triples store (llyn.py)
A Notation3 Pretty Printer (llyn.py)
A query engine (llyn.py)
An inference engine (llyn.py)
Builtins (llyn.py)

Each of thwse parts can be treated as essentially separate units.

@@ TODO: more stuff in this section.

Colophon: Quotes

I hope to reach cwm-enlightenment eventually, but I'm not holding my breath. - DanC, #rdfig 2001-05-07 00:18

Waxing homepage: Sean B. Palmer
Waning homepage: Sean B. Palmer