Executing a SPARQL Query from QueryPath

Submitted by matt on Thu, 2009-05-28 18:07

The Semantic Web. It is a concept that has sparked heated debate for years. While the debate may continue to rage for some time, there are already a host of technologies that can be used to build advanced applications based on XML technology. In this article, we will see how the SPARQL query language can be used to retrieve XML information from remote semantic databases (usually called SPARQL endpoints).
QueryPath already contains all of the tools necessary for running a SPARQL query and handling the results. This is not because QueryPath has been specially fitted to the task, but because SPARQL uses technologies that are widely supported: XML and HTTP. Since QueryPath can be used to make HTTP requests and then digest the XML results, we can use it to execute SPARQL queries and handle the results.
In this article, we will look at a basic SPARQL query, and see how we can use QueryPath to execute it and parse the returned results.
While SPARQL will be introduced here, it is far too robust a language to be explained in a short article. One starting point is the SPARQL Working Group home page.
The queries presented in this chapter will be run against DBPedia, a semantic version of Wikipedia. It makes all of the content from Wikipedia available as semantic content.

The SPARQL Query: A Brief Anatomy

Let's begin by looking at the SPARQL query that we will be running:

PREFIX foaf: <http://xmlns.com/foaf/0.1/>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
SELECT ?uri ?name ?label
WHERE {
  ?uri foaf:name ?name .
  ?uri rdfs:label ?label
  FILTER (?name = "The Beatles")
  FILTER (lang(?label) = "en")
}

The query above begins by defining two prefixes:

PREFIX foaf: <http://xmlns.com/foaf/0.1/>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>

A prefix is a convenient method for representing a namespace URI with a short string. Above, we create one for the Friend of a Friend namespace (foaf:) and one for the RDF Schema namespace (rdfs). Now, whenever we need to represent entities from those two schemata, we can just use the short prefix instead of the full URL.
The next part of the code above is the actual query:

SELECT ?uri ?name ?label
WHERE {
  ?uri foaf:name ?name .
  ?uri rdfs:label ?label
  FILTER (?name = "The Beatles")
  FILTER (lang(?label) = "en")
}

We are going to use the URI a lot, and it is easy to get hung up on the URI as a URL expressing a location. However, you are better off thinking of the URI as a unique identifier for an object -- a unique identifier that just happens to also be "dereferenceable". We can, in fact, use the URI to access information over the network (in this case).
If you have developed SQL before, this should look vaguely familiar. It functions similarly to a SQL SELECT operation. Here's what the code above does, phrased in plain English:

Select the uri, name, and label
where...
the uri has the name ?name (or, where the uri's name is stored in ?name)
the uri has a label ?label
the name is "The Beatles"
the language of the label is English

There are a few things to note about the structure of the query.
First, remember that the URI (?uri), is just a unique identifier. It is functioning sort of like a primary key for each object we query.
Second, the items that begin with question marks (?) are variables. Their value is assigned when the query is being executed.
Third, the items in the WHERE clause are not simply restrictive, as they are in SQL. In fact, the purpose of lines 3 and 4 isn't so much to limit the items returned, but to express a relationship between items. The general pattern of lines 3 and 4 is:

?subject ?relationship ?object

So ?uri foaf:name ?name can be understood to mean "Some object ID (subject) named (relationship) Some name(object)". As you may have guessed, foaf:name expresses the relationship "is named". Likewise, rdfs:label expresses the relationship "is labeled".
Assuming that we did not have the two FILTER functions, the query would simply return all objects (together with their names and labels) that had a name and a label.
The FILTER function is used to limit what content is returned. Above, we used two filters:

FILTER (?name = "The Beatles")
  FILTER (lang(?label) = "en")

The first filter says that the value of ?name must match (exactly) the string "The Beatles". Keep in mind that a given item may have multiple foaf:name items. The filter need only match one of the items.
The second filter requires that the label's language be in English. RDFS labels in the DBPedia database tend to have attributes indicating the language of the label. We are only interested in the English language content. In the query above, if we omit this, we will see results in Chinese, German, and Spanish, as well as other languages.
Putting this all together, then, our query will return the URI, the name, and the label for any URIs in the database that...

Have a name
Have a label
Have a name that is "The Beatles"
Have a label that is in English.

Next, we're ready to see how this query can be run against a remote, publicly available SPARQL endpoint (server) from QueryPath.

Running the Query

The query is, by far, the most complex aspect of our sample code. Here's what the entire code looks like:

<?php
require '../src/QueryPath/QueryPath.php';
 
// We are using the dbpedia database to execute a SPARQL query.
 
// URL to DB Pedia's SPARQL endpoint.
$url = 'http://dbpedia.org/sparql';
 
// The SPARQL query to run.
$sparql = '
  PREFIX foaf: <http://xmlns.com/foaf/0.1/>
  PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
  SELECT ?uri ?name ?label
  WHERE {
    ?uri foaf:name ?name .
    ?uri rdfs:label ?label
    FILTER (?name = "The Beatles")
    FILTER (lang(?label) = "en")
  }
';
 
// We first set up the parameters that will be sent.
$params = array(
  'query' => $sparql,
  'format' => 'application/sparql-results+xml',
);
 
// DB Pedia wants a GET query, so we create one.
$data = http_build_query($params);
$url .= '?' . $data;
 
// Next, we simply retrieve, parse, and output the contents.
$qp = qp($url, 'head');
 
// Get the headers from the resulting XML.
$headers = array();
foreach ($qp->children('variable') as $col) {
  $headers[] = $col->attr('name');
}
 
// Get rows of data from result.
$rows = array();
$col_count = count($headers);
foreach ($qp->top()->find('results>result') as $row) {
  $cols = array();
  $row->children();
  for ($i = 0; $i < $col_count; ++$i) {
    $cols[$i] = $row->branch()->eq($i)->text();
  }
  $rows[] = $cols;
}
 
// Turn data into table.
$table = '<table><tr><th>' . implode('</th><th>', $headers) . '</th></tr>';
foreach ($rows as $row) {
  $table .= '<tr><td>';
  $table .= implode('</td><td>', $row);
  $table .= '</td></tr>';
}
$table .= '</table>';
 
// Add table to HTML document.
qp(QueryPath::HTML_STUB, 'body')->append($table)->writeHTML();
?>

Risorse programmazione

domenica 13 novembre 2011