As of this date (08-Jun-2003) there have been some long threads on the IVOA Registry mailing list
about the structure and content of the VO resource registry, see threads beginning:
These threads are edging towards consensus and I thought I'd pull together my own thoughts on registry structure and the plans for AstroGrid
Note that these are a Sunday's musings and are open to wide discussion, they are not the final word on the AstroGrid architecture as regards registries.
To summarise the above threads:
- a registry will store information (metadata) about resources
- a registry will be maintained in one of three modes :
- full: which will keep details of all resources in the VO
- specialist: which keeps details of only specific resources
- source: which keeps details only about local resources for ease of harvesting
- all resources will be identified by a semantic-free resource identifier
- each resource belongs to only one class of resource (service, community, ...)
- each resource will describe itself using one or more metadata modules (catalog, image extraction, person, project, ...)
currently has plans for three registries. At first, it was believed that we could get all resources into one registry but after looking into the likely entries in each registry and the ways they would be used, this was considered impractical and unnecessary. The three registries are:
- Service Registry
- where services are defined as resources which can be invoked and can perform actions of benefit to the caller
- (in general I would imagine such resources to be web or grid services but might be cgi scripts or other invokable scripts - applet, servlet?)
- Community Registry
- wherein are stored details of organisations, people, groups etc
- will try to follow the Globus CAS model
- MySpace Registry
- keeping track of data items belonging to one or more communities
- (at the moment, AstroGrid refers to the concept of MySpace but if we're to sell it to the VO community in general, we may need an alternative name: I have started a discussion in the forum and am using perSpace in the diagram and in what follows)
will introduce a fourth mode
: which keeps details of one class of resource. This can be considered a combination of 'full' and 'specialised', in that it will not contain every resource in the known VO but will contain every resource within a class (so will constantly be updated).
Every resource, of whatever kind, will be identified by a two-part ResourceID
is owned by a registry (and only by one registry)), though it may hold it in trust for a person or organisation. A registry may own more than one authority. An authority is identified by an AuthorityID
A registry may only register resources under one of the authorities it owns. If it allows the creation of new authorities, it must hold a list of all authorities known in the VO to ensure non-duplication.
An AuthorityID will take the form of a DNS name (ivoa.net, www.astrogrid.org, ...). It can only contain valid alphanumeric characters and the '.' character.
The ResourceKey can either be assigned by the registry or the registrant. It's only constraints are that it must be unique within the context of the authority and may only contain valid alphanumeric characters, and the '.' and '/' characters.
A ResourceID can be catenated into a single string as:
where the scheme for all VO resources is ivo
So, the LEDAS-based Chandra archive might be: ivo://ledas.star.le.ac.uk/chandra/; the id for Leicester University might be: ivo://leicester.ac.uk/university; and the id for a file within my perSpace directory might be: ivo://myspace.leicester.ac.uk/ael13/pub/docs/reg.zip.
The big picture
As usual, I start with a humungous UML diagram:
(click here or in the diagram above for the full picture)
(or here for a pdf version)
In the diagram, ResourceRegistry is subclassed by ServiceRegistry, CommunityRegistry and perSpaceRegistry. Each of these classes of registry contain different types of resource.
Every registry must also contain at least one each of two unclassed types of resource:
- Registry : which identifies and describes the registry itself and (optionally) other registries in the VO
- Authority : which identifies the authority(ies) owned by that registry and (optionally) those owned by all other registries
There are presently three classes of registry that AstroGrid
- Service : implemented by the AstroGrid ServiceRegistry
- Community : implemented by the AstroGrid CommunityRegistry
- perSpace : implemented by the AstroGrid MySpaceRegistry
Other VO projects may choose to implement all three classes in a single registry.
This registry will store details of resources which can perform some action of benefit to the calling agent (lookup and return data, create a catalog, return a cutout of an 2- or 3-D image, catenate VOTable structures, ...). The two types of resource to be stored are:
- Service : any type of service-based resource
- Reference : stores a reference to some metadata that many services may contract to provide (eg data collection which is referenced by many services)
(a query against the registry would not normally return a reference resource on its own)
This registry will store details of people, organisations, projects etc. The resources identified so far are:
- Community : where the community registry serves multiple communities
- Group : a collection of people both from within and outside the community
( access to service resources will most likely be done via groups )
- Person :
- Project :
- Organisation : university, institute, ...
This registry will store details of servers, files and database tables. The resources identified so far are:
- psServer : file and database servers where resources are physically stored
- File : data stored in files
- Table : data stored in database tables
Each resource listed will have sets of metadata associated with it. Every resource must include:
- IdentityMetadata : metadata such as ResourceID, Name, Description, ShortName
- ClassMetadata : metadata deriving from the class of resource, eg specific curation metadata (so ServiceClassMetadata will include invocation method, calling address etc)
In addition, each resource will contract to provide zero, one or more modules of metadata (MMs). Each MM will contain metadata specific to that type of resource. I shan't list and describe these as they are relatively self-explanatory. They also show my lack of knowledge of astronomy - I'll leave it to the experts to decide which MMs are required.
Queries against the registry should follow a pretty standard format. A query should contain two aspects:
- selection criteria : specified in some query language (whether VOQL, a derivative or something more specific)
- returned MMs : a list of the MMs which are required for the returned metadata (selection criteria could include resources which contract to provide CatalogExtraction metadata )
The key issues still open are those of the structure and contents of the metadata modules.
- Do we go for small MMs and allow resources to select a wide range? or
- Do we just have a few MMs and most resources only select one (so lots of inappropriate metadata is not filled in)?
- Are the MMs hierarchical or class-structured, ie:
- If you contract to a low level MM then you must provide all the metadata belonging to parent, grandparent etc.? or
- Low level MMs incnude all the metadata from higher level MMs
Still lots of work to do
Please use the forum to discuss this document.
- 08 Jun 2003
I have begun trying to create a schema which reflects the above (which should be fun since I don't really understand schemas - let's see what sort of a mess I can create with XMLSpy
). I'll add these as attachments which should appear below...
- 10 Jun 2003
Have uploaded a new version of the UML diagram. This, I hope, makes it clearer how the metadata modules relate to classes.
This is in response to Ray Plante's list message 'Re: AstroGrid registry structure'
dated (Wed Jun 11 2003 - 22:43:35 MEST)
I have tried to address Ray's point about not allowing a resource to include MMs from different classes by making the class metadata be composed of the appropriate MMs.
(It isn't quite right for ServiceCMd: want to say that this is composed of either
1..* ServiceMMs or
1..* ReferenceMMs but I don't know how to do that in Together.)
I'll look at creating a new xsd as well and uploading that.
- 12 Jun 2003 10:00
Later. New uml uploaded. I talked myself out of Ray's approach. I agree that we need to restrict the level at which multiplicity of MMs occurs in the model but I still think we need the MMs and the ability to provide more than one per resource. New uml allows a wider range of examples.
Schema also uploaded (tlRegistry.xsd
- 12 Jun 2003 13:00
Latest version of schema uploaded (I've not updated the uml since the 12-Jun but see below for intermediate schema docs). I've tried to follow Ray's guidelines and this version (tlRegistry09.xsd
) allows resources to be of different types by virtue of using the ResourceType (or derived type) as their base and being a substitution for Resource element. I cannot say I fully understand the intricacies of schema construction but this does seem to have the desired effect.
- 22 Jun 2003
Bit later... I've modified these to a) make Class an attribute of ResourceType instead of an element and b) to derive by restriction rather than by extension so allowing me to change the fixed value of the class attribute for each type of resource (so 'Service' for ServiceType resources etc.). See: tlRegistry10.xsd
Still not right - cannot change class of Service/SkyService without the other changing!
- 22 Jun 2003