IVO names and locators
There is a need to identify things uniquely in the IVO, so we define naming schemes. There is a need to find those things on the network, so we define locator schemes. In each case, we follow established practice in IT and use Universal Resource Identifiers (URI)s.
However, there is disagreement on details of the use of the names and no IVOA standard for the locators. This has led to the introduction, in
AstroGrid-1, of locators based on special cases of the syntax of the names. There is a risk that identifiers cease to be intelligible across the IVO.
This paper suggest some rules for the use of identifiers.
Don't use URLs as names
A Universal Resource Locator (URL) is a sub-class of URI and URLs can be used as names under the IETF definition of URIs.
We should not do this in the IVO; it causes too much confusion when a given identifier may be either a name or a locator (e.g., in identifying a data file).
The scheme part of a name should not be a transport protocol that could appear in a URL (
http etc.). It should be some other abbreviation. The use of
ivo as a scheme for International Virtual Observatory Resource Names (IVORNs) is an example of good practice;
ivorn as a scheme name would have been even better.
The authority part of a name should not be written in the URL syntax for an address. Rather than writing
name-scheme://authority.formal.name/... we should write just
name-scheme:authority.formal.name/.... This helps to distinguish names from locators. We should follow this rule for new naming schemes even if the authority names happen to be internet domain names or addresses. Established naming schemes (e.g. IVORNs) that use URL address-syntax are unfortunate but should not be changed.
Use URI schemes to distinguish name resolvers
The scheme part of a name should indicate the class of resolving service: e.g.
ivo indicates that the name is an IVORN and may only be resolved in an IVO resource-registry. Each class of resolver should have only one name scheme, and each name scheme should have only one class of resolver.
When IVOA introduces a new kind of name (e.g. names for nodes in VOSpace), then it should define a new scheme for the URIs.
By extension, IVOA should not define naming schemes for which there is no associated name resolver. It would be possible to define these schemes:
- sky-node
- vospace-files
- community
to denote different classes of IVO service. The rest of an URI in one of these schemes would identify the individual service uniquely. A first sight, this looks like a neat way of tagging service names such that services of a particular type can be picked out of a mixed list. However, if there is no name resolver for the scheme, this naming is useless. The reader cannot find the service to use it (the name is not a locator) and cannot get information about the service (there is no resolver to serve it). Other metadata are needed.
Don't parse IVORNs
An IVORN is a unique key into the IVO resource registry and it consists in an authority identitifer and a resource key. The only meaning of the the authority identifier is to name uniquely the entity that registered the IVORN. The only meaning of the resource key is to identify registry entries.
Any other meanings for parts of IVORNs are local conventions, not IVOA standards. As such, they are fragile, since non-conforming IVORNs may leak into such a local system and be misunderstood.
Software outside the registry should not:
- infer a service name or location by parsing the authority identifier;
- infer anything about the kind of resource by parsing the resource path;
- encode the resource location or type anywhere in the IVORN.
When given an IVORN, the only useful thing to do with it is look up the metadata of that resource in the resource registry. When creating an IVORN, the only purpose is to let some other part look up those metadata.
Don't use IVORNs as locators
Locators should not be defined as a subset of IVORNs with special syntax. This breaks the rule of using different schemes for different semantic classes and the rule for not parsing IVORNs.
Don't use URLs to indicate resource type
A locator gives the location of an item on the network, nothing more. The syntax of a URL is fully occupied with giving the location and the transport protocol and should not be used to indicate semantics.
Software should not use a subset of URLs with special syntax to indicate a class of resource, such as a file within VOSpace, or a class of service. I.e., a programme should not choose what client software to use on the URL based on the authority or resource-path parts of the URL. The programme may, of course, chose client software to suit the scheme of the URL (HTTP etc.).
Instead, a programme should determine how to access a URL, and the type of data sent or fetched, by context. This means that stored URLs should be associated with other metadata that explain the context. The
VOResource structures in the resource registry are an example.
Use ephemeral URLs
Suppose entity A wants entity B to access some data inside the IVO. A
could determine a URL for the data and send it to B, but this is fragile:
- the data may have moved when B looks for them;
- B may not know the transport protocol in the URL;
- the transport protocol may not be the most-efficient one that B can use;
- B may not understand A's description of the type of data.
Instead, A should send the name of the stored data-set to B. B should resolve the name to a URL immediately before accessing the data; there may be a long delay between receiving the name and resolving it.
B should resolve the name in such a way that the semantics of the URL are made plain. This could be either that:
- B asks the resolver for a URL for the specific method of access (see example below); or
- the resolver returns the URL wrapped in metadata that indicate how it can be used.
As an example of the first case, B can ask either for the URL of the file on a web server or for the URL of a SOAP service that can apply an ADQL query to the file. Both are HTTP URLs and B understands the vast difference in their use.
As an example of the second case, A can pass an IVORN and B can get the URL by retrieving the VOResource for the IVORN. In this case, B may find several different access URLs each qualified by position in the VOResource schema.
Define semantic relationships between identifiers
Sometime names have implied relationship. E.g., if
vos:astrogrid-14/dir1/dir2 and
vos:astrogrid-14/dir1/dir2/file1 are names for two entries in VOSpace, then it may be that
dir2 is understood as the directory holding
file1, such that the result of a "get contents" operation on
dir2 includes (metadata of)
file1. This kind of relationship is very valuable, but only where it can be relied upon.
Semantic relationships between names should only be assumed in software where they are defined in the specification of the name scheme. In the case of relationships between two naming schemes, the relationship needs to be defined in both specifications. In general name-to-name translations should be left to name resolvers which will use catalogues of translations instead of brittle assumptions about syntax.
URL schemes from the IT industry do not have defined relationships between individual URLs or with other schemes. Therefore, IVOA name schemes cannot have useful and universal relationships with URL schemes. Proper name resolvers using catalogues are needed.
Make name resolvers universal
Given a name of a resource, not necessarily and IVORN, an entity can determine the matching type of name resolver from the URI scheme. There are typically many such resolvers. The entity holding the name does not know which individual name resolver can resolve that name.
In principle, the enquirer can look for the "right" name resolver in the resource registry: the query would be like "return services of type 'xyz-resolver' capable of resolving 'name'". This presumes either that:
- each name resolver writes in its resource-registry entry every name it can resolve (messy, inefficient and vulnerable to propagation lags within the registry); or
- the registry parses the name and chooses the resolver bases on a set of assumptions about the name (deprecated above).
Neither option is very good.
Instead, name resolvers should be made universal such that any name resolver for a scheme can resolve any name in the scheme. The resource registry has already acheived this.
Universal recognition of names implies exchange of information between resolvers. This can either be done synchronously, at the time when a name is resolved (resolution requests are forwarded between resolvers) or asynchronously, ahead of time (resolvers "harvest" information periodically, as with the resource registry). Asynchronous resolution makes for looser coupling and a more robust system; but it suffers from propagation delays and race conditions in updates. Synchronous coupling is more fragile but works better for quickly-changing sets of names.
Worked example: VOSpace
VOSpace is to maintain a collection of data sets (including both files and non-files, such as DB tables) organized hierarchically. Items in VOSpace may be long-lived and their details may be passed around the IVO. VOSpace is to be composed of several different kinds of storage service (web services, FTP servers, DBMS SRB installations). Items may move between physical storage while retaining their places in VOSpace. Items in VOSpace have strong implies interrelationships which the end-users are encouraged to use to understand their data holdings.
Because the data sets move in physical storage, and becuase of the interrelationships between the data-sets, locators are not sufficient. VOSpace needs names for data sets as well.
Because VOSpace contents change quickly, and because race conditions matter, the resource registry is not a suitable name resolver;
VOSpace names should not be IVORNs?.
IVOA does not yet have any name schemes other than IVORNs. Therefore, VOSpace names are something new. Therefore they need a new name scheme,
vos, say.
Because
vos names are not URLs, they do not include addresses. Therefore, they do not start
vos://; no double slash is required before the resource-path part.
The resource-path part of a
vos URI maps to a part of the virtual file-tree. Levels of the path are delimited by forward slashes. Each level of the file tree is defined to
contain levels following it, in the manner of a Unix file system. This means that recursive operations on the tree are possible. In particular:
- any VOSpace name-resolver has an operation to list the vos URIs for all immediate children of a node with a given vos URI;
- the parent of a node with a given URI can be determined by truncating the URI back to the preceeding slash, and this may be done by any entity, not just be a name resolver.
This means that the tree can be "walked".
The VOSpace tree has an implicit, single root-directory. VOSpace services offering storage to the IVO define a "branch" for that storage - i.e. a sub-tree of VOSpace directories - that they may "graft" onto the single root or may keep separate. The authority part of a
vos URI defines the branch of the tree and enables a VOSpace name resolver to locate that branch, even if the branch is not grafted.
Including a
vos authority name as the name of a VOSPace directory denotes a graft of one branch onto another.
Authorities for
vos URIs must be unique for all time within VOSpace but otherwise can be any string that conforms to the IETF rules for authorities. The authority identifier
might be listed in the resource registry, and
might be used in other contexts than VOSpace but users and programmes should not rely on this.
A
vos URI with a blank authority refers to the root of the tree.
Examples:
- vos:star.le.ac.uk:myspace/ A branch VOSpace called "myspace" registered by "star.le.ac.uk" (note lack of preceeding double-slash); in fact, the top directory of this branch (note presence of single slash at the end). We might guess from the name that this is on an AstroGrid MySpace? server at the University of Leicester. But our software mustn't guess; it must get these metadata from a VOSpace name-resolver.
- vos:star.le.ac.uk:myspace/a.e.linde A directory inside the previous example. It's Tony Linde's home directory in MySpace?, but again we must not assume that in software; we'd need to resolve the name to find out.
- vos:/star.le.ac.uk:myspace/a.e.linde The previous directory expressed as a graft on the root of the tree.
- vos:/ast.cam.ac.uk:myspace/../star.le.ac.uk:myspace/a.e.linde A tree-walk from the top of another branch to Tony Linde's space.
- vos:ast.cam.ac.uk:myspace-scratch-50839467/ The top directory of a scratch area called "myspace-scratch-50839467" allocated by "ast.cam.ac.uk" (IoA? Cambridge). The scratch area is a time-limited lease to a user, say to Tony Linde, of some storage space.
- vos:star.le.ac.uk:myspace/a.e.linde/ast.cam.ac.uk:myspace-scratch-50839467 The scratch area grafted onto Tony Linde's home directory.
Summary
Names and locators are precious to the IVO for their power and universality, but names are only unique identities and locators only denote places and transport schemes; neither is a container for extra metadata.
We should use names, locators and metadata structures
separately for what each is fit and preserve the original purpose.
--
GuyRixon - 08 Jul 2004