r8 - 01 Jul 2008 - 11:51:42 - DaveMorrisYou are here: TWiki >  Main Web  >  DaveMorris > VOSpacePlans20080609
A list of things that could/should be done for VOSpace.

The primary concern will be getting VOSpace 1.1 deployed and integrated with the rest of the system.

1 (CORE) VOSpace 1.1 (SOAP) interface

Work in progress to finish implementation and testing.

1.1 Core methods

The majority of the core methods are done, but there are a few remaining issues to solve, primarily with exception handling.

1.2 Find nodes

The regex find method is not implemented yet. In theory, this re-uses a lot of the code already in place for the list nodes method.

1.3 Finish IVOA specification

There were no major issues raised at the interop conference. However, we need to finalize the document and XML schema.

  • SOAP specific bugs
  • Link specific bugs
  • Capability (properties identified by URI)
  • Protocol (properties identified by URI)
  • Format (properties identified by URI)
  • Regex syntax for find

1.4 Interop testing

Setting up the interop tests and comparing behaviour between different implementations. Hopefully most of the work will be in setting up the tests.

1.4.1 Validation tools

Is it worth packaging our JUnit tests as a validation suite to test other IVOA services ?

2 (low) Back end server improvements

There are quite a few things to do on the back end server. Most, if not all, of them are not critical. The current service works, BUT it is slower than it should be and very difficult to administer.

Three reasons for changes to the back end server,

  • Fixing bad design
  • Improve system administration
  • Enable things we want in the future

2.1 Fixing bad design

The system was not 'designed' as such. It was more a case of an evolving code base that grew as each part of the specification was implemented. As a result, some of the early design choices caused problems that needed extra code to work around them later.

These problems cost us in code complexity (hence maintainability), reliability and performance. The aim of the refactoring is to simplify the system, removing or refactoring the earlier mistakes.

This isn't a top down re-write, more of a pause to fix a couple of the nastiest mistakes before building too much more on top. A lot of the work in developing the system went into building a comprehensive set of JUnit tests. This should make it easier to refactor parts of the system without having unintended side effects.

2.1.1 Refactor class inheritance

Work in progress to simplify the code structure and improve performance. The current code base relies on Hibernate to handle inheritance and polymorphism. In the course of developing this version we have learned a lot about what Hibernate can (and can't) do. As a result the current code base uses an inconsistent mixture of Hibernate and our own customizations to handle inheritance and polymorphism. Although we will probably still use a mixture of techniques, we need to make it more consistent, using the same set of techniques throughout rather than a random mixture.

  • node inheritance
  • deleted nodes
  • protocol handlers
  • format handlers

2.1.2 Sanity check on database structure

There are a number of options that could potentially improve performance. We could just add a second level database cache, based on the assumption that it will improve performance. However, without real numbers we would be guessing that this is the case. The main part of this task will be to setup a test environment that enables us to collect real performance figures from different configurations and compare them.

  • alternative database systems (javadb and mysql)
  • database cache testing

2.2 Improve system administration

If we want external providers to deploy vospace services then we need to do some more work on making it easier for system admins to manage.

2.2.1 Storage space

Admin tools to manage storage space.
  • storage space quotas
  • change store location
  • tools to backup the file store
  • tools to backup the database
  • improve admin interface

2.2.2 (core) Background threads

Work already in progress to create and manage background threads that delete unused files and recover resources.
  • resource recovery
  • history and logging
  • improve admin interface
  • testing

2.2.3 (core) Installation

Simpler service deploy using defaults. The system is designed to support multiple interfaces and protocols, including the legacy myspace and at least two versions of the vospace interface. As a result, there are lots of configuration options, many of which won't make sense to a first time installer.

We need to create a simplified 'just do it' install sequence that asks a simple set of questions and then configures the system using a standard set of defaults. The administrator then can explore the more complex alternatives once the basic system is up and running,

  • improve admin interface
  • simple installs for myspace and vospace services
  • testing

2.2.4 Configurable buffer data streams

Work already in progress to improve data transfer rates . This will enable the system admin to change the level of buffering on data transfers dynamically, based on client use, available bandwidth and system memory.
  • improve admin interface
  • testing

2.2.5 Admin interface

The current configration and management interface is limited by the static JSP pages.

Trying to implement complex admin interface using hand written JSP pages is costing us a lot of developer time. It would be worth investigating one of the popular Ajax libraries, possibly combined with the Spring framework underneath. I suspect that this would produce a better result, and save developer time in the long run.

  • learn how to use Ajax libraries
  • learn how to use Spring framework

2.3 (skip) Enable things we want in the future

2.3.1 Structured data handling

The vospace specification includes support for what it calls 'Structured data'. This means the vospace service understands and interprets the contents of the data, rather than just treating it as a binary file.

The hope is that as more vospace providers implement structured data handling, we will move away from a simple file system model, where the data is untyped, to a more structured system where the content has detailed metadata associated with it.

2.3.1.1 Archive formats

The vospace specification allows services to accept archive formats, e.g. tar or zip files, and unpack them as directories on the server.

Handling archive files represents the first step in a series of handlers for different file types.

2.3.1.2 Image collections

The vospace specification allows services to declare collections of specific types of data and provide additional capabilities depending on the data types. An example of this is providing a SIAP capability for a collection of images.

This level of functionality is still theoretical, but we need to start experimenting with prototypes.

2.3.1.3 Database tables

One of the long term goals is to be able to use vospace as a way of importing votable data into database tables, and being able to access the new tables via a DSA or TAP interface.

This level of functionality is still theoretical, but we need to start experimenting with prototypes.

2.3.2 Authentication

Identifying who the user is based on the certificate in the web service message.

2.3.2.1 Extract identity from SOAP call

Using the AstroGrid web service security components to identify the users from the certificate in the SOAP message, and adding functionality to the vospace back end to use the identity to assign ownership to nodes.

In theory, this should not be a huge task. It will require some changes to the vospace server, but most of the work will be in setting up the test environment to handle certificates and testing that the system behaves correctly.

2.3.3 (CORE) Authorization

Controlling who is allowed to do what to data within the vospace service.

2.3.3.1 (CORE) Simple rules

Initial simple form will be based on 'owner' and 'other'. Once we can identify who is making a web service call, then the initial set of rules will allow 'owner' to modify a node, and restrict 'other' to read access only.

2.3.3.2 (skip) Complex rules

This is a longer term project to handle more complex access control rules. This involves implementing a rule engine within the vospace service itself, and working with other groups within the IVOA to define a common management interface.

3 (CORE) VOSpace 1.1 (SOAP) delegates

A good part of this may already have been done by Paul. So for the moment I'll just list the things we will need to have. If Paul has already done them, then we can just tick them off and move on to the next bit.

3.1 Server delegate, used by other services (CEA/DSA etc).

  • Capable of resolving vos:// URIs
  • Capable of reading/writing vospace 1.0
  • Capable of reading/writing vospace 1.1
  • Able to resolve and traverse links transparently
  • Able to use multiple endpoint addresses (failover)

This is a fairly urgent task, but to test it we will need to have finished the vospace 1.1 service first.

Although it may not get used for a while, the sooner we have this working, the sooner we can start to add this to our other services (CEA/DSA etc). The sooner we add this capability to our existing services, the fewer legacy services we will have to deal with when we change over to using the vos: identifiers in the vodesktop client.

This may need to be be slightly different to the delegate designed for use by the desktop client. The primary use case for the server side delegate is to open a close data transfer streams quickly and reliably. This may require specific changes or optimizations to the delegate to make data access simple and easy to use.

3.2 Data transfer library, used by other services (CEA/DSA etc).

  • Improve performance of buffering
  • Read/write stream interface
  • Read/write File interface
  • Configurable data buffer
    • Testing
  • Improved error reporting
    • Testing
  • Failover and retry
    • Testing

A lot of the user feedback for the existing vospace/myspace system has been concerns with performance and reliability of data transfers. This task should compliment the work already being done on the server side to improve the performance and reliability of data transfers.

A key part of this will be to setup a test system to verify the performance and reliability before we deploy the new component in our other services.

3.3 Client side delegate

  • Capable of resolving vos:// URIs
  • Capable of accessing/exploring vospace 1.0
  • Capable of accessing/exploring vospace 1.1
  • Able to resolve and traverse links with (notification callback)
  • Able to use multiple endpoint addresses (failover)
  • Able to resolve alternative access URIs

This may need to be be slightly different to the delegate designed for use by the other services. The primary use of vospace in the desktop client is to generate the metadaa tree for the file explorer and file selection tools. This may require specific changes or optimizations to handle the detailed metadata tree. In particular, we may need to provide tools to resolve vos:// URIs to the corresponding ivo:// URIs using the service and node capabilities.

4 (ivoa) VOSpace 2.x (REST) interface

We need to lead the IVOA discussion on this, implementing prototypes and proposing versions of the schema. There is a lot of support within the IVOA for the REST interface, but it will need careful handling to reach a balance between the differing requirements.

Some of the groups interested in implementing vospace 2.x have not been directly involved in the development of vospace 1.x, so our experience should put us in a good position. However, if we don't join in early, we may loose our lead.

A key aim for us would be to base vospace 2.x on what we have learned from vospace 1.x, BUT avoid the problems we found in vospace 1.x.

5 (CORE) Desktop integration

A number of things need to be in place before we can use vos:// URIs for users home space identifiers in the vodesktop.

5.1 Basic URI handling

  • All our vospace services need to be upgraded to vospace 1.1.
  • All our CEA and DSA services need to be to handle vos:// identifiers.

  • The vodesktop client needs to be updated to handle vos:// identifiers
  • The vodesktop client needs to be able to explore data in a vospace system.

5.2 Security

In theory we can deploy vospace without the security system. However, unless there is an urgent reason for doing so I would suggest we should get the basic owner/other security in place before we make the changeover.

At the moment, users can access other peoples data using the myspace interface, but only by modifying the ivo: URIs manually. Vospace is designed to show the whole VO space as one integrated system, promoting data sharing between users. If we deploy the vospace system without some basic level of security, everyone will be able to see, and modify, everyone else's data direct from the vodesktop.

This means that our vospace services need to have the basic owner/other security policy in place, and all of our CEA/DSA services will need to be able to handle certificate delegation.

During the transition we could configure our services to provide access via both a secure and a non-secure access endpoint. To make use of this we will need some extra functionality in the client to check whether a service is capable of using a secure or insecure access point.

5.3 Community

As far as I know, most of the changes required to Community service are already in place. There may be some minor issues with account creation still to solve.

5.4 (SKIP) Transition period

During the transition we will need to operate a mixed system, where some services and clients can handle vos: identifiers and others cannot. Although we should do as much as possible to minimise the transition period, it is unlikely that we will be able to update all of the clients and services in one go.

As a result, the vodesktop client will need to be able to convert between vos: and ivo: identifiers depending on the capabilities of a particular service.

The metadata properties required to drive the conversion functions are already in the relevant IVOA standards.

5.4.1 'access vospace 1.1' capability

The first step is to define a new service capability identifier that represents 'can access vospace 1.1' then when we update our services to include the vospace 1.1 delegate we also add the 'can access vospace 1.1' capability to the service registration.

The new capability URI does not have to have an interface or endpoint associated with it. It just acts as a marker to identify CEA/DSA services that can resolve vos: identifiers and access data in a vospace 1.1 service.

This will enable a client to determine which services are capable of using the new protocol and which are not.

5.4.2 (skip) 'myspace alternative' capability

The second step uses the node capabilities list generated by a vospace 1.1 service, which lists the alternative ways of accessing a node.

The AstroGrid vospace services will implement both the legacy myspace and the new vospace 1.1 interface. If a vospace node is also accessible via a myspace interface, then the node capabilities metadata will contain details of how to use the myspace interface to access the node.

This means that the vodesktop client can use the vospace node metadata to get the corresponding myspace identifier for the same node.

5.4.3 (skip) URI translation

If a user has been allocated a vospace URI for their home space, then the vodesktop will use the vospace protocol to retrieve and display their vospace tree.

When the user sends a task to a CEA or DSA service the vodesktop will need to check the service registration for the 'can access vospace 1.1' capability.

URI-conversion.png

If the service is capable of handling vos: identifiers, then the task document can be sent to the service without modification.

If the target service is not capable of using vospace vos: URIs, then the client will need to translate vos: URIs into the corresponding myspace ivo: URIs before the task is sent to the service.

To do this, the client will have to check the vospace node metadata for an alternative myspace access URI.

If the vospace service does provide a myspace alternative then the client can substitute the alternative URI in the task document before sending it to the CEA/DSA service.

If the vospace service does not provide a myspace alternative, then the task will have to be rejected.

Doing the translation at this stage, just before the task is sent, means that we can also trap tasks submitted via the Python scripting interface as well as those created using the UI tools.

There are a couple of problems still to solve, but basically this should work.

Known unknowns :

  • How to handle the myspace URI that is returned by the DSA/CEA
  • How to handle vospace .auto names

5.4.4 (skip) File selection

If a CEA/DSA service cannot handle a vos: URI and the vospace service does not offer a myspace alternative, then the user will receive a "service can't access vospace" error when the task is submitted.

To mitigate this, we could modify the file selection dialoge to check the service metadata and modify the list of available nodes depending on the service capabilities. If the user is building a task for a service that cannot access vospace, then the file selection window can prevent them from selecting a node in vospace that does not have an alternative myspace endpoint.

node-selection.png

One way to present this to the user would be to 'grey out' nodes that the selected service cannot access.

In some cases we will not be able to avoid the 'unknown' state shown in the diagram. If the file selection window does not know which service the task is for, or it cannot determine the service capabilities, then it will not be able to check if the service can access the node. The most common case where this will occur is for CEA applications that have more than one service instance.

In theory we could process the metadata for all the services, and base the result the common set of capabilities. However, this would mean excluding nodes that some of the services can access. Alternatively, we could enable nodes that at least one service can access, but this would mean that selecting a particular could change the list of services that the task would be valid for.

For now, we should start with the simple case where we know that the service either can or cannot access the node, and figure out how to display this information to the user in some way. If anyone has suggestions on how to handle the more complex case please let me know.

Note that this is not a 'one off' situation caused by the transition from myspace to vospace. At the time of writing, several groups within IVOA are interested in implementing the REST vospace 2.x service interface, but not the SOAP vospace 1.x interface. This means that VO will always have a mixture of different services, each capable of accessing a different subset of vospace interfaces.

By learning how to solve this problem now, we will be able to cope with a mixture of different versions and capabilities when they become available in the wider VO.

Known unknowns

  • How to display this to the user
  • Multiple services for a single CEA
  • Vospace identifiers embedded in TAP/ADQL queries

5.4.5 (skip) Changes required

5.4.5.1 (skip) File selection context

In order to selectively 'grey out' nodes that the target service can't access, the file selection window needs access to the metadata for the service that the task is being created for.

5.4.5.2 (skip) Capability detection

The vospace client package needs to provide a helper that can interpret the metadata for a CEA/DSA service and a vospace node, and indicate whether the service will be able to access the node.

5.4.5.3 (skip) URI conversion

The vospace client package needs to provide a tool for converting a vos: URI into the corresponding ivo: URI if the vospace service provides an alternative myspace interface.

The vodesktop client needs to be able to trap a CEA/DSA task before it is sent and check if the target service can access the vospace nodes referred to in the task parameters.

Note, the vodesktop already has a hook in place to process the different forms of ivo: identifiers in a task document. If this hook also has access to the CEA/DSA service metadata, then the vos: to ivo: translation can be added at this point.

-- DaveMorris - 13 Jun 2008

toggleopenShow attachmentstogglecloseHide attachments
Topic attachments
I Attachment Action Size Date Who Comment
pngpng URI-conversion.png manage 14.3 K 13 Jun 2008 - 13:03 DaveMorris flow diagram for converting URIs
pngpng node-selection.png manage 17.6 K 13 Jun 2008 - 14:04 DaveMorris flow diagram for determining node access
Edit | Attach | Printable | Raw View | Backlinks: Web, All Webs | History: r8 < r7 < r6 < r5 < r4 | More topic actions
 
AstroGrid Service Click here for the
AstroGrid Service Web
This is the AstroGrid
Development Wiki
This site is powered by the TWiki collaboration platformCopyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback