r3 - 23 Aug 2007 - 17:19:40 - DaveMorrisYou are here: TWiki >  Astrogrid Web  >  DaveMorris > VOSpace20070821
These are the replies from the AstroGrid team to the RFC comments published on the IVOA wiki.


Comments from MarkTaylor?

Comment #1

  • Description members:
    • The Description members of various items are characterised "A text block describing ...". For implementors and users of this standard it would be helpful if a bit more detail about the intended format of this text block could be supplied, for instance should it be a short summary or a detailed description and should newlines and spaces in it be honoured or are multiple whitespaces insignificant. A mismatch in interpretation of this sort of thing between metadata supplier and data consumer can lead to ugly/unreadable presentation of such items to the user. I understand that the details of how various descriptions are to be written is somewhat dependent on schemas external to this document yet to be written, so possibly this can't be clarified at this stage.

Once we get a valid schema for registering these things, we should register a core set of properties, views and protocols. We can then use these as examples of 'best practice' in the specification.

Comment #2

  • getProperties operation
    • accepts return value is described as "A list of identifiers that the service accepts and understands". What does "understands" mean here?

This is a way for the service to declare that it will, interpret specific properties to have specific meanings.

As an example :
By default, all properties are treated as string name:value pairs, or more accurately as uri:value pairs. So a client can set a generic property, and the service will just store it as a string.

If we have a property URI that reprepresents 'file mime type', then a service that provided HTTP GET access could add the property value to the relevant header field of the HTTP GET response. By including this property URI in 'accepts' and 'provides' list, the service is declaring that it will allow clients to modify the value, and that the service understands that the property should be added to the relevant header field of a HTTP GET response.

A very basic service implementation might not implement support for adding the mime type header to a HTTP GET response. As the service does not 'understand' the property, it should not list the 'file mime type' property URI in the 'accepts' or 'provides' list. A client can still set the property, but the service would just treat it as a string value, and service would not try to interpret any meaning from the value.

If we have a property URI that reprepresents 'data size', then this will be dependant on the actual data itself, and logically, it should not be possible to set this from outside. If a service 'understands' the 'data size' property, then the service will generate the value directly from the stored data. If a client attempts to modify this from outside, then the service would throw a PermissionException?. By including this property URI in in the 'provides' list, but not the 'accepts' list, the service is saying that it will generate the property from the data, but it will not allow a client to set it.

Comment #3

    • contains return value: Is it wise to require this as a return? If the VOSpace service is implemented in such a way that it can accept arbitrary properties to be associated with each node, then I'd have thought this could be rather a large list and expensive to determine.

I've not seen a valid the use case for having this list - perhaps someone can supply one ?

Comments from RayPlante? - 20 Aug 2007

Comment #4

  • I think we need to treat these specifications like any other peer-review publication; thus, it is appropriate to include acknowledgements to our funders. For contributions by Matthew, I suggest the boiler-plate that I provided in my SSO comments.

Yes

Comment #5

  • There's an additional bit of boiler plate that I like to see in these documents that define abbreviations that we insiders are familiar with. Again, you can consider my suggested text for a preface setion called "Definitions".

Yes

Comment #6

  • I would strongly encourage numbering the sections of the document in the style "1.4.2". This allows one to refer to a specific defintition or requirement with a little more specificity (e.g. "This service not compliant because it does not ... as required by the the VOSpace specification, section 2.3.1").

Yes (I originally had numbering on, but had problems with it and dropped it due to time constraints).

Comment #7

  • Introduction: This document would greatly benefit from a subsection in the introduction (e.g. "Typical Use of a VOSpace Service") that steps us through an example of putting and retrieve data to/from a VOSpace, showing sample SOAP messages. This should introduce the major concepts, mapping them to concepts we understand (e.g. Node represents a file) without necessarily defining them generally. As it is now, it provides a bottom up description of the service--which is what you want in a spec; however, the reader doesn't discover how it all fits together until the end. (I felt like I was reading mystery novel, flipping the pages back to see how I was supposed to understand the clues revealed earlier. wink )

Yes

Comment #8

  • Introduction: Along the same theme, it would also be good to end the introduction with another subsection (e.g. "Document Roadmap") that describes how the rest of the document is laid out. With this road map in mind, the reader will understand how the parts will be fitting together as he/she is reading them.

Yes

Comment #9

  • Nodes and node types: please repeat the definition of VOSpace here. (It's given in the Abstract; however, an abstract is usually a summary of the contents of the document.) Then think carefully about how the term is being used in the section and make sure you are being consistant. I suspect that some clarification of the definition will be needed, but I'm not sure.

Need to check

Comment #10

  • Nodes and node types: please provide a clearer semantic definition of node before launching into a definition of the types of nodes. It should be clear (perhaps with additional explanation) before talking about the types that a data file (something we all understand) is represented as a node.

Need to check

Comment #11

  • Property identifiers: I strongly recommend that when IVOA identifiers are used to identify properties (as well as views and protocols) that they use the form, ivo://auth/blah#property-name. As the authors know, every IVOA Identifier must resolve to a separate resource description. By using the pound delimiter, all of the method names can be defined within a single resource description. The latest internal WD version of VOStandard (v0.2) supports this type of definition. We can push this to a released WD to help support VOSpace.

Probably yes, once we have defined exactly what the '#' means.

Comment #12

  • Views: In keeping with the above recommendation, I suggest that standard views be defined with one of the following forms:
    • ivo://ivoa.net/vospace#view-any, to include the definition in the standard that describes the VOSpace standard as a whole, or
    • ivo://ivoa.net/vospace/views#any, to have a resource specifically for the definition of views.
I prefer the first alternative, because it consolidates all the VOSpace information in one resource document, which will make maintanence simpler and make the registry seem less cluttered.

For the standard set of core properties, protocols and views I'd suggest

  • ivo://ivoa.net/vospace/core#view-any

So one resource document for the core properties, protocols and views.

However

  • First we need to finalise the schema for registering these things (including defining exactly what the '#' means and how it should be used).
  • Then we need to setup the ivo://ivoa.net/vospace registry to contain these things
  • Then we need to define the core set of properties, protocols and views.
  • It also depends on how many things we end up defining in the core document.

It should also be clear that although putting multiple definitions into one resource may be 'best practice', it is not mandatiory.

We need to get on with defining the registration schema, and publish some consistent examples for the core properties.

Comment #13

  • View descriptions, last sentence: The Registry WG can provide the clarity needed to replace this sentence with a more definitive statement on the timescale of the approval process for this document. In particular, the RWG should:
    • add support for DisplayName? in the VOStandard schema (if desired)
    • release updated schema to WD status

There may be more things we could suggest for the registry core schema.

We need to get on with defining the registration schema, and publish some consistent examples for the core properties.

Comment #14

  • Protocol identifiers: change form of IVO-ids to form using pound (#).

Probably yes, once we have defined exactly what the '#' means.

Comment #15

* Local NFS transfers: change form of IVO-ids to form using pound (#).

Probably yes, once we have defined exactly what the '#' means.

Comment #16

* Web service operations: it would be helpful to be a bit more explicit that when you say "description of the Node " that you mean a structured in terms of the model specified on p. 9. This can either be done by saying something like "(type: Node )" or with an explicit page or section reference to where Node is defined.

Need to check

Comment #17

* pushToVoSpace, Returns: it would be helpful to state here that the endpoint refers to the destination URL (yes?)

In the general case, yes, but we need to provide specific examples.

Comment #18

* pullToVoSpace, Parameters: it would be helpful to state here that the endpoint refers to the source URL (yes?). (And so on with other operations.)

In the general case, yes, but we need to provide specific examples.

Comment #19

* Can we include/reference the official standard WSDL?

Yes

Comment #20

* Can we define the Registry VOResource extension for registering VOSpaces? Is this coming?

We need to get on with defining the registration schema, and publish some consistent examples for the core properties.
People won't accept the service specification without this.

Comments from DougTody? - 20 Aug 2007

Comment #21

* It is not enough for Grid technology like VOSpace to be demonstrated in code written by the design team; we need to verify that it is useful in actual real world applications, written by others. While it is useful to have a specification for trials, I am not sure we should accept this as a proven standard until this has been demonstrated in real applications.

Yes

Doug's position is inconsistent with the IVOA rules. It also introduces a paradox: if there is no standard, who outside the "design team" would implement it? If there is no standard, then presumably it's not being used in "real-world" applications available to end users. In any, no validating, external implementation was produced for SSAP and that's gone to PR. -- GuyRixon - 22 Aug 2007

Comment #22

* As it stands though I am not sure I can understand all the details required to use this for basic data storage and transport. The key issue is that details essential for basic data manipulation, such as data format, basic file attributes, and transport protocol (HTTP etc.), are not addressed directly in the specification. This appears to be left to the service implementor, or to the client trying to upload data, to describe indirectly in registry records (View, Protocol, Transfer, etc.) independent of the actual VOStore specification. While the flexibilty to describe arbitrary data formats and transport protocols is nice, this approach appears overly general, with the result that the basic VOSpace specification will be hard and ambiguous to use and does not adequately address basic data management using common formats and protocols.

I agree.
The specification is intentionally abstract, but we could/should provide a supplementary document that describes how a simple http get and put service would work in real life.

Comment #23

* In my view, a basic architectural principle for services is that it should be possible to understand and use any service stand-alone, independent of other software such as the registry (although the registry might be used for related higher level functions such as discovery). This does not appear to be the case here, as fundamental information about transport details and data formats or attributes are only available via indirect URIs which are intended to be registry resolvable (there are some weasle words about nonresolvable URNs, but clearly this is discouraged and registry integration is the intention).

Again, good point. The supplementary document could provide a concrete example of how this would work.

Once we have registered the core properties, protocols and views then these will all become fixed URIs. Although the URIs would refer to descriptive registry resources, in practice simple clients and services would not need to dereference these via the registry.

Once these standard URIs are set, then they just become constant strings, and can be hard coded into both service and client. If both ends implement one of the standard protocols, then they can transfer data without needing to refer to the registry.

During normal operation neither the service or client should need to dereference any of the URI identifiers. They can all be treated as opaque identifier strings.

However this isn't clear in the way that the specification is worded.

If the supplementary document provided a concrete example of how a simple http get and put service would work, then this would become a lot clearer.

Use case #1 :

A simple client tries to access data in a service that only supports encrypted or authenticated protocols. The client does not understand these protocols, but it can still list the protocol (unknown) URIs in its error message :
    Unable to find common transfer protocol.
    Client supports :
        ivo://aaaa
        ivo://bbbb
    Service provides :
        ivo://xxxx
        ivo://yyyy

A slightly more complex client may dereference these URIs and display the display name for the unknown protocols :

    Unable to find common transfer protocol.
    Client supports :
        "Simple http put" (ivo://aaaa)
        "Simple ftp  put" (ivo://bbbb)
    Service provides :
        "Secure http put" (ivo://xxxx)
        "Secure ftp  put" (ivo://yyyy)

The user can then look for a different client tool that implements one of the sercure protocols (by searching the registry for applications that support the protocol URI).

Or, they can send a request to the client application developer asking them if they plan to support the new protocol. In which case, all the astronomer needs to pass to the developer is protocol URI (ivo://xxxx). The protcol description should provide enough information to enable the developer to implement this in. The simplest case would be to include a URL pointing to an external resource containing the full protocol specification.

Use case #2 :

A simple service implementation begins to see a lot of rejected transfers listed in their logs, for a new secure protocol that the service does not implement.

If their logs show that enough they are getting requests for this protocol, then they may decide that it is time to updated their service to support the new protocol. The service provider can pass the protocol URI to their developer team, who can then lookup the descrition in the registry.

Again, the protcol description should contain enough information to enable the developer to implement it. The simplest case would be to include a URL pointing to an external resource containing the full protocol specification.

Use case #3 :

A complex GUI client may dereference the protocol URIs and use the display name and description to present the options to the user in a more user friendly way. To speed things up, the client implementation may hard code the most common ones and cache the unusual ones.

The GUI tool may use the descriptive names and text from the registry resource to populate select lists, tool tips or help boxes.

---+++ Comment #24

  • Basic things such as a data format ("view") or available transfer protocol (e.g., HTTP) are described indirectly in descriptors which are stored in a registry and referenced by URI (so far as I can tell). Although it does not explicitly say, I suspect we might be able to use string equality on such a URI to test for these things (as one would test a MIME type for example), but in principle we would need to look the URI up in the registry, and parse the XML data structure which comes back, before we can determine basic information about what a service can do. The actual spec repeatedly says things like "at the time of this writing, the schema for registering in the IVO registry has not been finalized". Even ignoring the undesirable registry interaction, which should not be required to directly use a service, it does not appear that such details are sufficiently specified at this point.

Once the core set of properties, protocols and views have been registered, simple client and server implementations may treat the URIs as hard coded string constants. There is no need to dereference the registry URIs unless a complex client wants to use the information to display the user friendly names.

We need to finalise the registration schema and publish the core set of properties, protocols and views before people can start to use this.

Comment #25

* The interface defines adhoc SOAP methods which can query the service capabilities; these (appear to) return obscure, indirect URIs which contain further information on service capabilities such as what transport protocols the service supports. Whatever happened to the proposed getCapabilities operation, which is supposed to describe the capabilities of a service instance? This would seem to be the obvious way to describe things such as what formats (views), or transport protocols a given service instance supports.

As far as I am concerned, the 'server wide' getProtocols(), getViews() and getProperties() are redundant.
Does anyone have a detailed use case that requires these ?

However, Doug makes a good point, if we are going to have these, then it would make sense to put them in the service registration.

Eventually, I would prefer to use VOSI rather than ad-hoc methods. However, it is very disruptive to issue successive versions of the registration schema, whereas we can change the ad-hoc methods each time we update the standard. I suggest that we keep the ad-hoc methods for now and aim to replace them with capability metadata for VOSpace 1.1 and later. -- GuyRixon - 22 Aug 2007

Comment #26

* Although the spec says it does not attempt to address hierarchy, the restriction to "no slashes" in a URI path appears arbitrary. The internal logical structure implied by a pathname should be transparent to something like VOSpace, for which this is merely a component of a string identifying the file within a VOSpace. This might change if the VOSpace supports "directory"-level operations, but this could be added without affecting file pathnames within a VOSpace. Otherwise, to flatten a directory hierarchy, it will be necessary for a VOSpace to invent arbitrary filenames, unique within a VOSpace, to substitute for pathnames containing a slash.

This was intentional. We plan to get v1.1 agreed as soon as possible, and that would support hierarchical data structure, with '/' as the path delimitor. This means that '/' will have a specific meaning in vospace 1.1, so to avoid confusion between the two we want to exclude it from use in vospace 1.0.

It may be possible that a vospace 1.1 service publishes a sub directory as a vospace 1.0 service, allowing vospace 1.0 clients to access data within just that sub directory. In which case, we want the URI identifiers to make sense between the two systems.

If we allow '/' in vospace 1.0 names (and just treat them as normal strings without interpreting the '/'), then there would be an 'odd' different in behaviour between vospace 1.0 and vospace 1.1 services.

  • If '/' was not excluded, the you could create something called 'a/b/c/d' in vospace 1.0 even though the impled parent 'a/b/c' didn't exist (much like the current registry).
  • In vospace 1.1, creating 'a/b/c/d' would fail if 'a/b/c' didn't exist because it treated the '/' as a path delimitor.

We wanted to avoid this discrepancy by explicitly excluding '/' in vospace 1.0.

Comment #27

* I am not convinced of the need for "structured" nodes at this early stage, or for a VOSpace to perform arbitrary file format or data model-based conversions. This might be useful to allow a VOSpace to be used to access tables stored natively in a RDBMS, but at present this is not sufficiently well defined, at least not in the written specification. It is not clear whether a VOSpace should natively provide such a capability when this will already be provided by the more object-oriented DAL interfaces such as TAP, SIA, etc. It might be best to first provide a solid VOSpace interface for basic "file" (simple byte stream) access before addressing object-oriented access.

The specification for StructuredData? is very thin, and needs a lot more detail before people can use it in a meaningful way.

As for the specific examples of the DAL interfaces, TAP, SIA etc. The StructuredData? was added to the specification to support/compliment these types of services, not to replace them. One target application of vospace is to provide support for importing data into these services.

Specific examples :

Images in SIA

The SIA specification provides tools for finding and getting images, but as yet, no API for importing images. In order to provide an import mechanism, a SIA service could also provide a vospace API. The user could use the vospace API to transfer images into the service, using the StructuredData? node to specifically identify the files as FITS images.

  • If the user imported file as UnstructuredData? containing FITS images, the service would just store them as files on disk.

  • If the user imported them as StructuredData? containing FITS images, then the service would attempt to interpret the contents of the files.
    • If the files did not contain valid data, then the service would reject them.
    • If file did contain valid data, then the service would process the FITS headers and add them to the SIA database.
    • The images would then be available for access via the SIA interface.

VOTable in TAP

The TAP interface provides tools for searching and querying the database, but as yet, no API for importing data. In order to provide an import mechanism, a TAP service could also provide a vospace API. The user could use the vospace API to transfer tabular data into the service, using the StructuredData? node to specifically identify the files as tabular data.

  • Again, the user may import UnstructuredData? containing VOTable data, and the service would just store them as files on disk.

  • If the user imported them as StructuredData? containing VOTable data, then the service would attempt to interpret the contents of the files.
    • If the files did not contain valid data, then the service would reject them.
    • If file did contain valid data, then the service would process the VOTable metadata and create new database tables to contain the data.
    • The new tables would then be available for access via the TAP interface.

As a specific example of how this could be used, AstroGrid plans to provide a database service that contains static data from a large survey co-located with user data imported via vospace. Users would be able to use the vospace service to upload their own VOTable files into the user area of the service, and then use the TAP interface to create ADQL join queries to cross reference data from the large survey with their own data.

Comment #28

  • It seems overly complex to describe each node property as an independently resolvable URI. The obvious solution is more along the lines of a simple "name=value". Yes, names could fail to be globally unique, but this is possibly less of a problem than adoption of an overly-complex and under-specified approach (so far as I can tell, no standard properties, even for basic file attributes such as size, modify date, etc., have yet been defined). An alternative way to address the problem of uniqueness might be to use property names such as "type:name" where "type" defines a property namespace, e.g., "file". Then we could have one property instance (possibly optional) of something like "file:schema", the value of which would be a single URI pointing to something which defines all the defined names for that namespace. (This is similar to what we do for UTYPE namespaces already for example). While this could be extensible, the most common cases could be defined directly in the core standard, without any need to inspect the registry or any such outside service.

See above on how simple client and server could use the core URIs as string constants.

We need to get the schema established and a core set of properties, protocols etc defined. Until then, people will keep raising this.

Comment #29

  • In "Views" it would be better to separate concerns such as the content type of a file (FITS, VOTable, etc.), from unrelated matters such as GZIP compression (ZIP is different yet since it is a multi-file container). Otherwise it is much harder for a client to sort out what a "View" offers; it would have to parse a View descriptor, understand all the options, see what is offered in this particular view and how that compares to what the client wants, and so on.

Views are poorly defined and probably not functional beyond simple file types.

We will need to address 'zipfile containing FITS images' for vospace 1.1.

Comment #30

  • For "unstructured" data nodes it should be possible to record primary data attributes such as the MIME type of the "file" (data node) directly, without having to deal with some indirect registry entry for a "View". The obvious thing would be a property such as file:MIMEType. If we really need object metadata at the level of VOStore, this could be a separate property namespace such as "table:NRecords".

A core property that represented 'file mime type' would be fine.

Comment #31

* What does moveNode do, really? Is this a rename, as in Unix?

Yes

  • In vospace 1.0 it is equivalent to 'rename'.
  • In vospace 1.1 it becomes equivalent to the full 'mv' in unix.

Comment #32

  • Does pushToVOSpace deal with a single data node? It says it returns a list of URLs, but there is only a single destination node, so I would guess that the URLs refer to alternative transport protocols. How does the client decide which to use -does it have to parse each URL to determine the protocol? (Possibly this is addressed in the WSDL, but semantic details like this should be addressed in the written specification as well).

Yes, the list of protocol options represent different endpoint(s) which provide access to the same data.

These could be alternative mirror services with the same protocol, or different protocol URLs.
If asked to provide access to the data via http or ftp, the service response could contain :

  • two separate http URLs for different http mirrors
  • plus a ftp URL
All of which would provide access to the same data.

The client is free to use them in any order it wants to, but it can only reliably use each one once (some of the endpoints may be one-shot URLs with cookies embedded in the URL itself).

The client does not have to parse the URL, it only has to recognise the protocol URI. Again, these URIs can be hard coded into the client as part of the code that implements that protocol.

Comment #33

  • One of the things I was looking for was whether a URL could be obtained to directly GET or PUT a file, and this does appear to be the case (e.g, pullFromVOSpace). This would allow us, for example, to have a data service deliver a message to a client giving the access reference URL for a data object, once it has been generated by an asynchronous service. This might allow the VOSpace machinery to be hidden from a simple client, for example when functionality such as VOSpace and UWS are used within a TAP or SIA data service.

Yes - this was one of the use cases we had in mind when designing the interface.

  • A vospace client can handle the transfer negotiation, requesting access using 'standard HTTP GET' or 'standard FTP GET' in the list of protocols.
  • The the response would contain standard endpoint URLs for the requested protocols.
  • The vospace client can then pass these URLs to a separate application which does the data transfer.

What we haven't addressed is how long a data-access URL remains valid. It may be that this is implementation-dependent (a.k.a "random"), which is OK when the client for the space and the data transfer are the same but difficult if they are separate as per Doug's case. Can we put a "should" clause on duration of validity? Can we specify what should happen if the data-access URL is stale? (E.g. is access forbidden? Does the resource simply disappear?). -- GuyRixon - 22 Aug 2007

-- DaveMorris - 21 Aug 2007

Edit | Attach | Printable | Raw View | Backlinks: Web, All Webs | History: r3 < r2 < r1 | More topic actions
 
AstroGrid Service Click here for the
AstroGrid Service Web
This is the AstroGrid
Development Wiki
This site is powered by the TWiki collaboration platformCopyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback