GraphSL: RDF Graph Schema Language

Introduction

GraphSL is a different kind of schema language for RDF that lets you make subsets of RDF graphs against which you can validate instances.

This language does not replace or extend RDFS or OWL. It uses them in places, but performs a different task to them. RDFS and OWL are there to document, add semantics to terms, and create ontologies. GraphSL is here to specify RDF document types, to enable validation.

The closest equivalent to GraphSL is XML Schema and the other XML schema (small "s") languages. XML Schema enables people to create sub classes of XML infosets; GraphSL enables people to create sub classes of RDF graphs. It's unfortunate that the name "RDF Schema" is taken!

What GraphSL Does

Whilst RDF Schema and OWL enable one to constrain the use of properties and classes in RDF, it would also be useful to be able to place constraints on the types of triples in RDF graphs.

RDF applications from FOAF to RSS 1.0 to EARL have at times in their developments made people ask if these languages have a particular structure to their documents. GraphSL allows you to define a graph structure without resorting to making a canonicalized subset of XML/RDF with XML Schema.

For example, what is an FOAF document exactly? Is it something which contains [ a foaf:Person ], and then lots of foaf:* properties hanging off of that node, possibly with any extra data? Then we'd like to be able to say that. And in EARL, we wanted to be able to say that an EARL document is a serialization of an RDF graph that contains at least one earl:Evaluation sub class of rdf:Statement in it. There was no schema language around at the time that could express that, and there still isn't, so this proposal aims to fill that gap.

Making a Schema

RDF Graphs are sets of statements, also known as triples. We want to be able to create particular sub classes of graphs such that they have particular constraints on the kinds of triples contained therein.

For example, let's create a document class, :FOAFNameGraph, that must contain a triple that is a member of the following sub class of rdf:Statement:-

:FOAFNameStatement rdfs:subClassOf rdf:Statement, 
     [ owl:onProperty rdf:subject; owl:allValuesFrom foaf:Person ], 
     [ owl:onProperty rdf:predicate; owl:hasValue foaf:name ], 
     [ owl:onProperty rdf:object; owl:allValuesFrom rdfs:Literal ] .

And also, a link from the instance to the schema for its graph type:-

:GraphTypeStatement rdfs:subClassOf rdf:Statement, 
     [ owl:onProperty rdf:subject; owl:allValuesFrom :Document ], 
     [ owl:onProperty rdf:predicate; owl:hasValue :graphType ], 
     [ owl:onProperty rdf:object; owl:allValuesFrom :Graph ] .

So the resulting schema will be something like:-

:FOAFNameGraph rdfs:subClassOf :Graph; 
   :schema ([ :statement :FOAFNameStatement; :occurances "1" ] 
            [ :statement :GraphTypeStatement; :occurances "1" ]) .

The following is clearly a valid instance of this graph type:-

<> :graphType :FOAFNameGraph .
[ foaf:name "Bob B. Bobbington" ] .

However (and here comes the sneaky annoyance), in order to find out that the subject, for example, belongs to the class foaf:Person, we must know that foaf:name rdfs:domain foaf:Person, and then apply the rule { ?prop rdfs:domain ?x . ?subj ?prop ?objt } => { ?subj rdf:type ?x } to be able to validate it. This creates a sort of PSVI for RDF Graphs... a Post Schema Validation Graph?

@@ Cardinality restrictions in OWL kinda try to do a little of what GraphSL does, but they have that non-monotonic assumption, whereas here we're free from it.

Wildcards

One way to get around this is to allow unbounded other RDF Statements in the graph. So basically one would say "you must have exactly one FOAFNameStatement, one GraphTypeStatement, and then any amount of any rdf:Statements". The way to do that:-

:FOAFNameGraph rdfs:subClassOf :Graph; 
   :schema ([ :statement :FOAFNameStatement; :occurances "1" ] 
            [ :statement :GraphTypeStatement; :occurances "1" ] 
            [ :statement rdf:Statement; :maxOccurances "unbounded" ]) .

@@ Perhaps we should use :occurances "*" instead, and also allow ranges such as :occurances "0-15". That should make things less complicated, though it "hides" some information in literals.

Then you can have the type information required for validation in the actual graph:-

<> :graphType :FOAFNameGraph .
[ a foaf:Person; foaf:name "Bob B. Bobbington" ] .
rdfs:Literal :isTypeOf "Bob B. Bobbington" .

@@ Note the isTypeOf hackaround.

But though wildcards are actually useful for other things, people willn't want to include type information in instances that can be concluded from rules. So we must have a different approach.

Validation Recipies

It would be possible to specify the schemata and the rules used in order to gather type data. You can do this in N3 with builtin properties such as log:concludes, but it's messy.

@@ More.

Going Further: Deploying in Real-World Situations

FOAF Example

Now for an actually useful real-world example. Say we want to state that FOAF documents are those documents which have: a) exactly one ?x a foaf:Person statement, b) one or more ?x foaf:* ?? statements. We've introduced labelled and unlabelled variables here, which is a bit worrying, but let's handle it as:-

:FOAFPersonStatement rdfs:subClassOf rdf:Statement, 
     [ owl:onProperty rdf:subject; 
       owl:allValuesFrom foaf:Person; :variable :x ], 
     [ owl:onProperty rdf:predicate; owl:hasValue rdf:type ], 
     [ owl:onProperty rdf:object; owl:hasValue foaf:Person ] .

:FOAFStatement rdfs:subClassOf rdf:Statement, 
     [ owl:onProperty rdf:subject; 
       owl:allValuesFrom foaf:Person; :variable :x ], 
     [ :onProperty rdf:predicate; :valueFromVocab :FOAF ], 
     [ owl:onProperty rdf:object; owl:hasValue :ResourceOrLiteral ] .

:FOAFDocumentGraph rdfs:subClassOf :Graph; 
   :schema ([ :statement :FOAFPersonStatement; :occurances "1" ] 
            [ :statement :FOAFStatement; :occurances "1-*" ]) .

@@ Can we use owl:onProperty still with :valueFromVocab? @@ Note the ResourceOrLiteral class--nothing's easy in RDF.

OWL Example

Even the OWL language reference has something which could be better expressed in GraphSL (@@ but they probably do it with restrictions anyway, though that's in conflict with the following definiton, which is actually a constraint on syntax, so it's even harsher than what GraphSL can provide).

<owl:Restriction>
  <owl:onProperty rdf:resource="(some property)" />
  (precisely one value or cardinality constraint, see below)
</owl:Restriction>

We need to introduce OR into GraphSL, but thankfully OWL already provides that in the way of owl:unionOf.

:OWLOnPropStatement rdfs:subClassOf rdf:Statement, 
     [ owl:onProperty rdf:subject; 
       owl:allValuesFrom owl:Restriction; :variable :x ], 
     [ owl:onProperty rdf:predicate; owl:hasValue owl:onProperty ], 
     [ owl:onProperty rdf:object; owl:allValuesFrom rdf:Property ] .

:OWLHasValueStatement rdfs:subClassOf rdf:Statement, 
     [ owl:onProperty rdf:subject, 
       owl:allValuesFrom owl:Restriction; :variable :x ], 
     [ owl:onProperty rdf:predicate; owl:hasValue owl:hasValue ], 
     [ owl:onProperty rdf:object; owl:allValuesFrom :ResourceOrLiteral ] .

:OWLCardinalityStatement rdfs:subClassOf rdf:Statement, 
     [ owl:onProperty rdf:subject; 
       owl:allValuesFrom owl:Restriction; :variable :x ], 
     [ owl:onProperty rdf:predicate; owl:hasValue owl:cardinality ], 
     [ owl:onProperty rdf:object; owl:someValuesFrom rdfs:Literal ] .

# @@ there're more value and cardinality properties; this is just an example

:OWLRestriction rdfs:subClassOf :Graph; 
   :schema ([ :statement [ rdfs:subClassOf rdf:Statement, 
                 [ owl:onProperty rdf:subject; 
                   owl:allValuesFrom owl:Restriction; :variable :x ], 
                 [ owl:onProperty rdf:predicate; owl:hasValue rdf:type ], 
                 [ owl:onProperty rdf:object; owl:hasValue owl:Restriction ] ]; 
              :occurances "1" ] 
            [ :statement :OWLOnPropStatement; :occurances "1" ] 
            [ :statement [ owl:unionOf (:OWLHasValueStatement
                                        :OWLCardinalityStatement) ]; 
              :occurances "1" ]) .

Note that the statement declaration for dealing with [ a owl:Restriction ] statements is anonymous in the :OWLRestriction!:schema list; just to show that it can be done. It could've been called :OWLRestrictionTypeStatement and followed the same format as the others.

@@ Perhaps [ :variable :x ] really just means :x rdf:type :OccursAsSubjectOnce, and then you reuse :x. Hmm... no.

Example: Merging FOAF and OWL

This is a contrived example, but we might want to do graph sub class merges, such as:-

:FOAFDocumentGraph rdfs:subClassOf :Graph; 
   :schema ([ :statement :FOAFPersonStatement; :occurances "1" ] 
            [ :statement :FOAFStatement; :occurances "1-*" ]) .

:OWLRestriction rdfs:subClassOf :Graph; 
   :schema ([ :statement :OWLRestrictionTypeStatement; :occurances "1" ] 
            [ :statement :OWLOnPropStatement; :occurances "1" ] 
            [ :statement :OWLHasValueStatement; :occurances "1" ] 
            [ :statement :OWLCardinalityStatement; :occurances "1" ]) .

:FOAFPlusOWLThing rdfs:subClassOf :Graph; 
   :imports (:FOAFDocumentGraph :OWLRestriction) .

Which makes :FOAFPlusOWLThing like this:-

:FOAFPlusOWLThing rdfs:subClassOf :Graph; 
   :schema ([ :statement :FOAFPersonStatement; :occurances "1" ] 
            [ :statement :FOAFStatement; :occurances "1-*" ] 
            [ :statement :OWLRestrictionTypeStatement; :occurances "1" ] 
            [ :statement :OWLOnPropStatement; :occurances "1" ] 
            [ :statement :OWLHasValueStatement; :occurances "1" ] 
            [ :statement :OWLCardinalityStatement; :occurances "1" ]) .

The rule for flattening imports:-

{ ?x :imports [ :schema [ rdf:member ?y ] ] } 
   => { ?x :schema [ rdf:member ?y ] } .

@@ Is there any better way of handling this?

GraphSL for GraphSL

We would, of course, at the least want to show that GraphSL is at least expressive enough to describe GraphSL document graphs! Such a graph would be pretty complex, since handling arbitrary lengthed rdf:Lists precisely is difficult. We can, of course, be quite loose about things, and just say that there must be one or more [ :statement [] ] and [ :occurances [] ] statements, or one [ :imports [] ] statement, and then do flexible constraints on the rdf:List machinery. Not too difficult, but a hefty example.

Sean B. Palmer