In June 2007, NormanGray and TonyLinde submitted a proposal into the JISC Capital Call 01/07 to develop SKUA (Semantic Knowledge Underpinning Astronomy). In October 2007 we were advised that the project was accepted by JISC and would be funded from Jan'08 to Jun'09 at the requested level (310K). The project will involve both principals (Norman @ 50%, Tony @ 33.3%) and we'll be looking to hire an additional PDRA (Level 7) for the 18 months. A separate web site will be developed for this project as well as an informational page at JISC and we will post links as soon as these are available. For now, we'll post a few details from the proposal:

Executive summary

We propose the creation of a semantic infrastructure for astronomy based on the organisation of assertion services with relatively simple interfaces. Astronomy has been part of the UK's e-Science effort since its inception, the majority of this under the AstroGrid project. The focus of this effort, in the UK and within projects in at least 15 other countries, is the creation of a worldwide Virtual Observatory (VO), making astronomical data and applications easily available to astronomers regardless of their location and affiliation. The VO will, by defining and implementing standard interfaces, make it possible to access common resources from multiple applications. These resources are located via a globally distributed resource registry, which has been defined and working for over two years now.

To date, relatively little work has been done within the VO effort on semantic systems development. An ontology of object types has been developed by VO-France and one of us (Gray) has developed an access control system based on OWL inferencing and a mechanism for converting the standard VO registry XML format to RDF triples. Our project will provide a semantic infrastructure with toolkit and API which will make it possible for many more VO developers to engage with Semantic Knowledge Organisation Systems (SKOS). The key benefit of this proposal is that it engages with an existing vibrant development and user community, and builds upon working infrastructure, making it possible to demonstrate and prove both concepts and tools as we develop them. In doing so, we engage with key outcomes of the Capital Programme and its e-infrastructure programme.

The core concept of SKUA is that of a Semantic Assertion Collection (SAC). A SAC is a service combining an RDF triple store with an interface providing the ability to:

  • store, modify and delete assertions (RDF triples);
  • return the result of SPARQL queries; and
  • optionally federate its queries to one or more other SACs.
This simple extension to proven tools forms the basis of an infrastructure which supports federating tags and queries across multiple collections, covering perhaps a user’s personal collection, that of a project they are working on, the department they belong to, and the worldwide VO. This allows for the construction of very personalised queries.

On top of this layer of capability, we will construct a few sample applications to demonstrate some of the additional functionality that it might provide. We expect other developers to build many more such examples. This layer and the SAC components will be packaged as a toolkit for these developers. In addition we will take part in JISC and astronomy meetings to promote the technology.

Introduction

The Semantic Web has, with startling speed, graduated from wild-eyed vision 5 to deployable engineering. The goal of letting computers ‘understand’ has solidified into established practice and competing implementations, so that now, with the bleeding edge moving off into yet more exotic directions, is the ideal time to bring the core technologies to practical application. Europe has a world-leading role in the world-wide Semantic Web (SW) community, the fruit of years of heavy EC investment in the technology. The SKUA project will embed this expertise in a UK project, thus disseminating it from the UK to the worldwide VO community, and within the UK to the other metadata projects supported by the JISC.

The SKUA Project (Semantic Knowledge Underpinning Astronomy) will implement a distributed architecture of semantically aware RDF stores. This ‘semantic layer’ will support a cluster of applications which will either directly support users in finding and recovering useful resources, or indirectly support them by supporting user-facing applications. We describe the architecture and an initial set of applications below. Although the system we build will be specialised to astronomy, and proved by its interaction with, and eventual embedding within, the Virtual Observatory, the bulk of the semantic knowledge is localised in the RDF store, with the design goal that it could be replaced if desired by the analogous semantic knowledge of a different domain.

SKUA architecture

Project architecture

The core component is a network of Semantic Assertion Collections (SAC) providing rather generic semantic Web Services. For performance reasons, we expect the semantic reasoning within the SACs to be rather simple, with more elaborate reasoning either performed in the background and separately asserted, or simply retained within value-adding clients. The optimal level of integration with, or even replacement of, the VO registries, will become clear during the course of the project.

This structure integrates with e-Infrastructure outcomes by supporting new ways of retrieving data, and by integrating with key initiatives in the wider research community.

We conceive the semantic layer as a directed acyclic graph (DAG) of SACs, each of which can store a greater or smaller number of RDF triples and, crucially, federate queries to a configurable list of partner stores, in such a way that a query against one SAC is effectively made against the RDF triples stored in that SAC and all the SACs that it federates to (Fig. 2). Thus the personal SAC, which may be a local desktop service or a personal section of a remote service, will typically store user-specific annotations or notes, and the global SAC will store VO-wide information such as an RDF mirror of the VO Registry. Information is transparently shared by being copied from a local SAC to an appropriate one of the SACs shared within a research group, or an ad-hoc group of collaborators, with this copy process being managed, directly by the user, using a small UI, or as a part of a separate user-facing application’s functionality.

Each SAC has a (standard) SPARQL endpoint which will respond to queries both from clients and from other SACs which federate to this one. Each SAC will also support a simple RESTful API for managing its RDF data.

A SAC must not respond to queries indiscriminately, since to do so would expose possibly private annotations; each SAC will keep a list of those SACs to which it has permitted federation. The topology of federations is specified exclusively by the SACs which do the federation; the permission to query or to write to a SAC is the responsibility of the SAC being federated to. The VO is deploying a SSO/Security infrastructure which this project would make use of. This infrastructure would handle the authentication issues involved, but we anticipate leaving the SAC access-control as the responsibility of the SACs themselves (either internally, or at the HTTP layer if appropriate).

We believe these three functions – querying, updating and sharing RDF information 17 – will support a flexible and open-ended array of user-supporting client applications, and we will validate this assertion by developing an initial set of such applications, as described below.

The SKUA project uses standard standard technologies and protocols, composed in an innovative way. The SACs will build on one of multiple available triplestore implementations; they will be queried using the W3C -standardised SPARQL query language (http://www.w3.org/TR/rdf-sparql-query/ as at June 2007). The VO security infrastructure realises JISC investments by building on the Shibboleth infrastructure. The simple SAC management interface will be specific to the SACs, but there will be no requirement for this to go beyond the standard REST interaction pattern. Our goal is to produce a simple, open-source, and easily composable,Web Service, proved by applications. This builds on the PIs’ experience with generations of application/service deployments in the VO and other projects.

Case studies and completion scenarios

The core of our proposal is the SAC architecture described in Sect. 3.1. The SAC servers will comprise a relatively thin layer on top of currently available triplestore technology, and so we do not expect the server implementation to be challenging.

Deployment and user buy-in will be at least as large a problem. The PIs have a long and continuing involvement in the VO community, and so can lead this deployment and react quickly to user requirements. However, user acceptance can be encouraged by producing exemplar applications, which illustrate how the architecture can be used, and which are independently useful. We describe two such applications here, which we will implement during the course of the project.

Tagging resources and sharing bookmarks

The most basic use of the SAC network, used by both of the applications below, and most immediately usable by existing user-facing applications, is to allow users to tag and bookmark resources on the web or within the VO (since tags and bookmarks are technically identical, and differ only in how they are used, we will talk only of tags below), and share those tags with other users. Web- 2.0 services such as del.icio.us and Flickr have shown how very successful simple tagging can be, both to let users re-find resources they have found useful, and to be told of resources they had not found before. We can do better than simple tagging, however, since a tagging application can make use of the semantic context available from the SAC to suggest and interpret tags both when tagging and when querying. At least one existing VO application uses a private tagging framework, demonstrating that the demand is present.

Application: Spacebook – semantic VRE

As the name suggests, Spacebook has an interface and (liberal) sharing model styled on the very successful social software application, Facebook. In the case of Spacebook, though, individuals will be able to create and share queries, workflows and assertions about VO resources, in addition to supporting a professional/social network. In this, Spacebook will be a type of Virtual Research Environment (VRE) with additional semantic functionality. The VRE aspect will include portlets which embed components from the AstroGrid VO project including: query construction and submission, workflow construction and submission, virtual storage and jobs status; all these components are available now. Analogously with Facebook, Spacebook will have the concepts of Person, Institute, Group and Project, with Institute membership keyed to a user’s institutional email address. Individual users may create Groups, and Spacebook administrators may create Projects.

Scenario: Claire logs into Spacebook and sees a summary of activities in all the areas to which she belongs including current status of long-running jobs that she has submitted. One such job was a complex workflow which has completed. She verifies the results are valid, tags the workflow script in order to describe it and then pushes the script into her project area [Spacebook will transfer the script from Claire’s virtual storage area to the project’s, it will then pick up all the assertions in Claire’s SAC associated with this workflow and push them to the project’s SAC, with her agreement; this will also move assertions relating to the workflow’s components]. In a blog she reads about a new paper published in her field so tags that for later reading [Spacebook adds assertions about the paper (via an arXiv URL), and passes the paper to a text mining tool which parses the paper for terms in the VO astro-ontology, her SAC and federated SACs, adding them to the assertions for that paper – Claire can review and change them when she later reads the paper]. She then moves into her Project area in Spacebook (where the workflow appears as a new item added). One of her colleagues has created a new version of a data analysis tool that implements an algorithm the project has developed. She makes this tool accessible to ‘friends’ in a Group specially created to test the tool [Spacebook copies assertions about the tool to the Group SAC]. Finally, Claire wants to execute a query that a colleague has placed in the Project area but over a different set of data sources. She begins typing into a search box; as she types each term, a graphical representation of associated terms appears with tags often cited together appearing closer. One term in the tag-cloud catches her eye as crucial and she clicks and adds this to her list of terms. In a window separate to the tag representation, a list of data resources appears and is refreshed as she enters each term [as she types, Spacebook conducts searches on each term or set of terms through Claire’s personal SAC, the project SAC and all SACs to which these are federated; data resources associated with highly cited tags will appear on the resource list]. Claire picks the data sources she wants to use, submits her query and heads off for a coffee.

Application: Suggestions server

A continuing problem within the VO is that of browsing or searching the existing 24 registries for resources of interest, since the obvious ways of doing this produce either too few, or far too many hits. The situation is improving with the arrival of better interfaces, but the semantically rich information available within the SAC network (the user’s local SAC plus those it delegates to) would allow for richer query support. We have preliminary designs for a ‘suggestions server’, acting as a web service, which would take a list of one or more resources of interest, and return other sets of resources related to the initial ones by an open-ended set of algorithms, using semantic relationships, connections to existing astronomical controlled vocabularies, and statistical cluster analysis, implemented as plugins to the server.

Scenario: Jules is writing an application to help users find new VO resources. His user has already identified a few useful resources, and Jules would like to find more similar ones. He makes a simple query to a suggestions server, listing the known resources, and asking for ‘more like this’; the server responds with groups of resources which are ‘like’ the initial set in various more-or-less heuristic ways, leaving Jules to display these to the user in whatever way best fits with his UI.

Other use-cases

Using NaCTeM tools, and other specialised text-mining tools developed with the VO, we can conceive of one or more SAC client applications deriving information from text sources and adding it to a personal or group SAC.

Another value-adding client application would be an access-control service, managing role and group information asserted within, and distributed amongst, the SACs. We have outline designs for such a service, which would build on the distributed nature of our semantic layer, but do not intend to implement it in this project, simply using it as one of the potential use-cases to drive the design.

Addendum on Skuas

Check out this wikipedia article.

I don't know how long this link will be valid but there is a short video clip here of the BBC Nature of Britain programme showing the Arctic Skua: best bit shows them dive-bombing Alan Titchmarsh.

Moral of the story: when choosing project names, it helps to read all of the relevant Wikipedia entry. Skuas are scavengers (we'll take information from anywhere and process it into muscular goodness – good connotation); it turns out, however, that skuas are also short-tempered psychotic kleptoparasites who, when they've finished stealing other birds' food, will bully seagulls for fun – not such a good vibe. Hmm: can we change the project name to Helpful and Knowledgeable Fluffy Bunny Rabbit? Please?

Topic revision: r6 - 2007-11-04 - 16:45:03 - TonyLinde
 
SKUA Click here for the
AstroGrid Service Web
This is the AstroGrid
Development Wiki

This site is powered by the TWiki collaboration platformCopyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback