SPARQL 1.1 Becomes Official Recommendation Status

TimFor a couple of us “propeller heads” here at YarcData, we had a semi-earth-shattering moment when the SPARQL 1.1 specification became a recommendation. Yes, I’m aware of how that sounds; however, please bear with me a moment longer while I share a bit of my excitement.

I started working with ontologies back in 2001 and have come to love this form of knowledge representation and the semantic web technologies that accompany it. The idea of being able to describe the world in terms that are adequately descriptive and yet elegant is a bit of a geek love of mine. Once I discovered RDF and SPARQL, I have been working to do what I can, in my own little corner of the world (no jokes about that being a closet please), to progress the state of this technology that has such high potential to change the way we think and interact with computers.

These days, I am but one member on a team who qualify as implementers of this standard. While I am not a member of the Sematic Web Working Group, and thus not an authoritative source of information on their activity, I can tell you that work on SPARQL 1.1 has been in progress at least four years (the first draft was published October 22, 2009; the recommendation on Mar 21, 2013).

Coming from a RDBMS background as applications developer and DBA, I tend to enjoy working with SPARQL and trying to come up with elegant ways to query data and exploring the nuances of this language and its companion that is RDF.

So, now bear with me a moment longer while I share with you a “where were you when [X] happened moment…” Yesterday, a colleague, Rob Vesse, and I were filming a new video for our website. I was in the back of the room while he was shooting a segment in which he mentioned passingly in his dialog that SPARQL was still in “proposed recommendation” status. I thought to myself, “wouldn’t it be kind of funny if they released the final version today,” this having a fairly large impact on our jobs as implementers of this standard.

One of the more challenging tasks of any developer is trying to design and develop to a moving target, so we have been kind of anxiously awaiting the day it would become a “recommendation” of the W3C, the final state of any specification traveling through the W3C standards development process. So, immediately following the wrap of his shoot, I grabbed my laptop to check, and sure enough it was perhaps only hours after the semantic web working group decided to publish the final recommendation status of the SPARQL 1.1 spec. This produced a sufficiently funny “You’ve got to be kidding me” kind of a response from Rob, when I shared with him the news, since he didn’t want to go back and redo his whole spiel.

As a data geek, I dream of a day when people and computers are able to talk to each other in a language absent of ambiguity, holes, dead ends and “No results found.” And pardon me whilst I get my geek on, in what I think may someday be a small milestone towards that vision of the future…but thanks for hanging in with me while I did.

With that I bid you “query on…”

Using Venn Diagrams to Understand Named Graphs

TimMany people are beginning to recognize SPARQL for its many strengths as a query language.  But, did you know that using a Venn diagram to represent the graphs could really help you to understand working with graphs?

Let’s start by arranging our data store according to this simple Venn diagram.

NOTE: Here I’ve chosen to populate all things that are not in set A or B into the default graph, for illustrative purposes.

Let’s put some triples into each of the graphs shown, with the intent of having some overlap in them, using the following data:

NOTE: (named graphs URIs correspond to the sets in the diagram)

So, if we imagine that we loaded these datasets into our store then we should be able to get any part of the Venn diagram.  Here’s how:

This last operation uses the basic join property of SPARQL to select out all things where ?s, ?p and ?o are common in both graphs.

As you can see, RDF and SPARQL are designed to excel at basic set operations.  Using the Venn diagram, we can see that notionally when we apply a named graph to some triples, it is as if we are putting a bounding area on the graph or what’s often referred to as coloring the graph.  So, in effect, named graphs are a technique for coloring graphs.

A Narrative for Understanding SPARQL’s: FROM, FROM NAMED and GRAPH

TimWhen I first began learning SPARQL, I was fortunate enough not to have to try and tackle the named graphs concept of RDF and SPARQL at the same time.  My queries were fully limited to the default graph and I had some time to learn queries in SPARQL before attempting to move on.  In fact, in those days there were very limited educational resources on either RDF or SPARQL, and I learned mostly by example.  But then I had need to tackle named graphs and I can recall just how difficult it was to gain an understanding of how the SPARQL constructs for working with named graphs work.  Later, as I joined YarcData and also became active on “Semantic Overflow” (which now exists as the Q&A board at semanticweb.com, http://answers.semanticweb.com) I realized that very many smart people also struggle with the concept for some time before they come to grips.  Here I’ll describe to you how I came to understand graphs and provide some resources to hopefully help you as well.

We have to think carefully about what data our query is being applied to.  A simple query with none of the keywords under examination is always applied to the RDF dataset (an in-memory model, local persisted model or remote data store, that is the target of the query ). E.g.

In this case, the query is being applied to our data store, without modification, and since we have no GRAPH keyword, it is applied to the “unnamed” or “default” graph (more on that later).  However, if we add both or one of FROM or FROM NAMED then we should make a mental model that replaces the RDF dataset as our query target with a new temporary dataset that is described by the FROM and/or FROM NAMED list.

Now we must realize two things about their semantics:

  1. FROM builds only the default graph (of our temporary store)
  2. FROM NAMED builds only the named graphs

Examine the following example diagram to see how this works.  Let’s begin with a target store that contains some triples in the unnamed graph and also some triples in each of graph A thru C (recall an RDF store can contain 1 unnamed, or default, graph and 0 or more named graphs).

Some things to note from the above diagram:

  • Row 1 has no FROM or FROM NAMED clauses so we query the RDF dataset as is.
  • Row 2 says the RDF dataset we query will be an unnamed graph containing the triples from the named graph <http://a> from the original dataset, and the named graph <http://b> will contain those from the original dataset’s named graph <http://b>
  • Row 3 says the dataset we query will have an unnamed graph containing the union of the triples from <http://a> and <http://b>, and that our query dataset will contain no unnamed graphs.
  • Row 4 is similar to row 3, only we are saying get our named graphs A and B from the original query dataset’s A and B.
  • Finally, row 5 says load triples from the original dataset’s A, B and C graphs into our unnamed graph and do not create any named graphs.

Now that we know how FROM and FROM NAMED allow us to build the RDF dataset we will query, we have only to understand how our basic graph patterns allow us to select data from the query dataset.

This is a simple query that says query all data from all graphs from the dataset we specified with any FROM or FROM NAMED keywords.  The portion {?s ?p ?o} says query all triples from the unnamed graph.  The portion GRAPH ?g { ?s ?p ?o } says query all unnamed graphs (as ?g is unbound) and since we UNION the two we will get all triples in both the unnamed graph, plus named graphs.  If no FROM or FROM NAMED is specified in the query then this the query will return all data from your RDF dataset (Beware: if you have a very large persisted store, you may be in for a very long wait!)

Now let’s look at the semantics of the GRAPH keyword.  The GRAPH keyword takes an unbound variable or an IRI.  If the former then the triples pattern is applied to all named graphs, and those that have matching triples are bound to ?g.  If the latter, then the triples pattern is applied to only the named graph identified by the IRI given and this IRI further restricts the named graphs to just the given IRI if contained in the query dataset specified.  Some interesting consequences of this are:

As I said when we began, understanding how to query RDF datasets using SPARQL’s FROM, FROM NAMED and GRAPH keywords can be very tricky when we first begin to work with SPARQL. Hopefully, in reading this article I have been able to share with you an understanding that will save perhaps hours of frustrating trial and error to come to an equivalent understanding. Now that we are through with this topic, see if your understanding matches with my next blog “A truth table for SPARQL query datasets”.

Happy Sparql-ing!

Further Resources:
[1] SPARQL 1.1 Specification on RDF Datasets
[2] SPARQL by Example: slide 33