A list of things that could/should be done for VOSpace.
The primary concern will be getting VOSpace 1.1 deployed and integrated with the rest of the system.
1 (CORE) VOSpace 1.1 (SOAP) interface
Work in progress to finish implementation and testing.
1.1 Core methods
The majority of the core methods are done, but there are a few remaining issues
to solve, primarily with exception handling.
1.2 Find nodes
The regex find method is not implemented yet.
In theory, this re-uses a lot of the code already in place for the list nodes method.
1.3 Finish IVOA specification
There were no major issues raised at the interop conference.
However, we need to finalize the document and XML schema.
- SOAP specific bugs
- Link specific bugs
- Capability (properties identified by URI)
- Protocol (properties identified by URI)
- Format (properties identified by URI)
- Regex syntax for find
1.4 Interop testing
Setting up the interop tests and comparing behaviour between different implementations.
Hopefully most of the work will be in setting up the tests.
1.4.1 Validation tools
Is it worth packaging our JUnit tests as a validation suite
to test other IVOA services ?
2 (low) Back end server improvements
There are quite a few things to do on the back end server.
Most, if not all, of them are not critical.
The current service works, BUT it is slower than it should be and very difficult to administer.
Three reasons for changes to the back end server,
- Fixing bad design
- Improve system administration
- Enable things we want in the future
2.1 Fixing bad design
The system was not 'designed' as such.
It was more a case of an evolving code base that grew as each part of the specification was implemented.
As a result, some of the early design choices caused problems that needed extra code to work around them later.
These problems cost us in code complexity (hence maintainability), reliability and performance.
The aim of the refactoring is to simplify the system, removing or refactoring the earlier mistakes.
This isn't a top down re-write, more of a pause to fix a couple of the nastiest mistakes before building too much more on top.
A lot of the work in developing the system went into building a comprehensive set of JUnit tests.
This should make it easier to refactor parts of the system without having unintended side effects.
2.1.1 Refactor class inheritance
Work in progress to simplify the code structure and improve performance.
The current code base relies on Hibernate to handle inheritance and polymorphism.
In the course of developing this version we have learned a lot about what Hibernate can (and can't) do.
As a result the current code base uses an inconsistent mixture of Hibernate and our own customizations
to handle inheritance and polymorphism.
Although we will probably still use a mixture of techniques, we need to make it more consistent,
using the same set of techniques throughout rather than a random mixture.
- node inheritance
- deleted nodes
- protocol handlers
- format handlers
2.1.2 Sanity check on database structure
There are a number of options that could potentially improve performance.
We could just add a second level database cache, based on the assumption that it will improve performance.
However, without real numbers we would be guessing that this is the case.
The main part of this task will be to setup a test environment that enables us to collect real performance figures from different configurations
and compare them.
- alternative database systems (javadb and mysql)
- database cache testing
2.2 Improve system administration
If we want external providers to deploy vospace services
then we need to do some more work on making it easier
for system admins to manage.
2.2.1 Storage space
Admin tools to manage storage space.
- storage space quotas
- change store location
- tools to backup the file store
- tools to backup the database
- improve admin interface
2.2.2 (core) Background threads
Work already in progress to create and manage background threads that delete unused files and recover resources.
- resource recovery
- history and logging
- improve admin interface
- testing
2.2.3 (core) Installation
Simpler service deploy using defaults.
The system is designed to support multiple interfaces and protocols,
including the legacy myspace and at least two versions of the vospace interface.
As a result, there are lots of configuration options, many of which won't make sense to
a first time installer.
We need to create a simplified 'just do it' install sequence that asks a simple set of questions
and then configures the system using a standard set of defaults.
The administrator then can explore the more complex alternatives once the basic system is up and running,
- improve admin interface
- simple installs for myspace and vospace services
- testing
2.2.4 Configurable buffer data streams
Work already in progress to improve data transfer rates .
This will enable the system admin to change the level of buffering on data transfers
dynamically, based on client use, available bandwidth and system memory.
- improve admin interface
- testing
2.2.5 Admin interface
The current configration and management interface is limited by the static JSP pages.
Trying to implement complex admin interface using hand written JSP pages is costing us a lot of developer time.
It would be worth investigating one of the popular Ajax libraries, possibly combined with the Spring framework underneath.
I suspect that this would produce a better result, and save developer time in the long run.
- learn how to use Ajax libraries
- learn how to use Spring framework
2.3 (skip) Enable things we want in the future
2.3.1 Structured data handling
The vospace specification includes support for what it calls 'Structured data'.
This means the vospace service understands and interprets the contents
of the data, rather than just treating it as a binary file.
The hope is that as more vospace providers implement
structured data handling, we will move away from a simple
file system model, where the data is untyped, to a more structured system
where the content has detailed metadata associated with it.
2.3.1.1 Archive formats
The vospace specification allows services to accept archive formats,
e.g. tar or zip files, and unpack them as directories on the server.
Handling archive files represents the first step in a series of
handlers for different file types.
2.3.1.2 Image collections
The vospace specification allows services to declare
collections of specific types of data and provide additional
capabilities depending on the data types.
An example of this is providing a SIAP capability for
a collection of images.
This level of functionality is still theoretical,
but we need to start experimenting with prototypes.
2.3.1.3 Database tables
One of the long term goals is to be able to use vospace as
a way of importing votable data into database tables, and
being able to access the new tables via a DSA or TAP interface.
This level of functionality is still theoretical,
but we need to start experimenting with prototypes.
2.3.2 Authentication
Identifying who the user is based on the certificate in the web service message.
2.3.2.1 Extract identity from SOAP call
Using the
AstroGrid web service security components to
identify the users from the certificate in the SOAP message,
and adding functionality to the vospace back end to use the
identity to assign ownership to nodes.
In theory, this should not be a huge task.
It will require some changes to the vospace server, but most of the work will
be in setting up the test environment to handle certificates and testing
that the system behaves correctly.
2.3.3 (CORE) Authorization
Controlling who is allowed to do what to data within the vospace service.
2.3.3.1 (CORE) Simple rules
Initial simple form will be based on 'owner' and 'other'.
Once we can identify who is making a web service call, then the initial set of rules
will allow 'owner' to modify a node, and restrict 'other' to read access only.
2.3.3.2 (skip) Complex rules
This is a longer term project to handle more complex access control rules.
This involves implementing a rule engine within the vospace service itself,
and working with other groups within the IVOA to define a common management
interface.
3 (CORE) VOSpace 1.1 (SOAP) delegates
A good part of this may already have been done by Paul.
So for the moment I'll just list the things we will need to have.
If Paul has already done them, then we can just tick them off and move on to the next bit.
3.1 Server delegate, used by other services (CEA/DSA etc).
- Capable of resolving vos:// URIs
- Capable of reading/writing vospace 1.0
- Capable of reading/writing vospace 1.1
- Able to resolve and traverse links transparently
- Able to use multiple endpoint addresses (failover)
This is a fairly urgent task, but to test it we will need to have finished the vospace 1.1 service first.
Although it may not get used for a while, the sooner we have this working, the sooner
we can start to add this to our other services (CEA/DSA etc).
The sooner we add this capability to our existing services, the fewer
legacy services we will have to deal with when we change over to
using the vos: identifiers in the vodesktop client.
This may need to be be slightly different to the delegate
designed for use by the desktop client.
The primary use case for the server side delegate is to
open a close data transfer streams quickly and reliably.
This may require specific changes or optimizations to
the delegate to make data access simple and easy to use.
3.2 Data transfer library, used by other services (CEA/DSA etc).
- Improve performance of buffering
- Read/write stream interface
- Read/write File interface
- Configurable data buffer
- Improved error reporting
- Failover and retry
A lot of the user feedback for the existing vospace/myspace system
has been concerns with performance and reliability of data transfers.
This task should compliment the work already being done on
the server side to improve the performance and reliability of data transfers.
A key part of this will be to setup a test system to verify the
performance and reliability before we deploy the new component
in our other services.
3.3 Client side delegate
- Capable of resolving vos:// URIs
- Capable of accessing/exploring vospace 1.0
- Capable of accessing/exploring vospace 1.1
- Able to resolve and traverse links with (notification callback)
- Able to use multiple endpoint addresses (failover)
- Able to resolve alternative access URIs
This may need to be be slightly different to the delegate
designed for use by the other services.
The primary use of vospace in the desktop client
is to generate the metadaa tree for the file explorer
and file selection tools.
This may require specific changes or optimizations to
handle the detailed metadata tree.
In particular, we may need to provide tools
to resolve vos:// URIs to the corresponding ivo:// URIs
using the service and node capabilities.
4 (ivoa) VOSpace 2.x (REST) interface
We need to lead the IVOA discussion on this, implementing prototypes and proposing versions of the schema.
There is a lot of support within the IVOA for the REST interface, but it will need careful handling to
reach a balance between the differing requirements.
Some of the groups interested in implementing vospace 2.x have not been directly involved in
the development of vospace 1.x, so our experience should put us in a good position.
However, if we don't join in early, we may loose our lead.
A key aim for us would be to base vospace 2.x on what we have learned from vospace 1.x,
BUT avoid the problems we found in vospace 1.x.
5 (CORE) Desktop integration
A number of things need to be in place before we can use vos:// URIs
for users home space identifiers in the vodesktop.
5.1 Basic URI handling
- All our vospace services need to be upgraded to vospace 1.1.
- All our CEA and DSA services need to be to handle vos:// identifiers.
- The vodesktop client needs to be updated to handle vos:// identifiers
- The vodesktop client needs to be able to explore data in a vospace system.
5.2 Security
In theory we can deploy vospace without the security system.
However, unless there is an urgent reason for doing so I would
suggest we should get the basic owner/other security in place
before we make the changeover.
At the moment, users can access other peoples data using the myspace interface,
but only by modifying the ivo: URIs manually.
Vospace is designed to show the whole VO space as one integrated system,
promoting data sharing between users.
If we deploy the vospace system without some basic level of security,
everyone will be able to see, and modify, everyone else's data
direct from the vodesktop.
This means that our vospace services need to have the
basic owner/other security policy in place, and all of our
CEA/DSA services will need to be able to handle certificate delegation.
During the transition we could configure our services to
provide access via both a secure and a non-secure access endpoint.
To make use of this we will need some extra functionality in the client
to check whether a service is capable of using a secure or insecure
access point.
5.3 Community
As far as I know, most of the changes required to Community service are already in place.
There may be some minor issues with account creation still to solve.
5.4 (SKIP) Transition period
During the transition we will need to operate a mixed system,
where some services and clients can handle vos: identifiers and
others cannot.
Although we should do as much as possible to minimise the
transition period, it is unlikely that we will be able to
update all of the clients and services in one go.
As a result, the vodesktop client will need to be able to convert between
vos: and ivo: identifiers depending on the
capabilities of a particular service.
The metadata properties required to drive the conversion
functions are already in the relevant IVOA standards.
5.4.1 'access vospace 1.1' capability
The first step is to define a new service capability identifier that represents 'can access vospace 1.1'
then when we update our services to include the vospace 1.1 delegate
we also add the 'can access vospace 1.1' capability to the service registration.
The new capability URI does not have to have an interface or endpoint associated with it.
It just acts as a marker to identify CEA/DSA services that can resolve vos: identifiers
and access data in a vospace 1.1 service.
This will enable a client to determine which services are capable of using the new protocol
and which are not.
5.4.2 (skip) 'myspace alternative' capability
The second step uses the node capabilities list generated by a vospace 1.1 service,
which lists the alternative ways of accessing a node.
The
AstroGrid vospace services will implement both the legacy myspace and the new vospace 1.1 interface.
If a vospace node is also accessible via a myspace interface,
then the node capabilities metadata will contain details
of how to use the myspace interface to access the node.
This means that the vodesktop client can use the vospace node metadata
to get the corresponding myspace identifier for the same node.
5.4.3 (skip) URI translation
If a user has been allocated a vospace URI for their home space,
then the vodesktop will use the vospace protocol to retrieve and
display their vospace tree.
When the user sends a task to a CEA or DSA service
the vodesktop will need to check the service registration
for the 'can access vospace 1.1' capability.
If the service is capable of handling vos: identifiers,
then the task document can be sent to the service
without modification.
If the target service is not capable of using vospace
vos: URIs, then the client will need to translate vos: URIs
into the corresponding myspace ivo: URIs before the task
is sent to the service.
To do this, the client will have to check the vospace node metadata
for an alternative myspace access URI.
If the vospace service does provide a myspace alternative
then the client can substitute the alternative URI
in the task document before sending it to the CEA/DSA service.
If the vospace service does not provide a myspace alternative,
then the task will have to be rejected.
Doing the translation at this stage, just before the task is sent,
means that we can also trap tasks submitted via the Python scripting
interface as well as those created using the UI tools.
There are a couple of problems still to solve, but basically this should work.
Known unknowns :
- How to handle the myspace URI that is returned by the DSA/CEA
- How to handle vospace .auto names
5.4.4 (skip) File selection
If a CEA/DSA service cannot handle a vos: URI and the vospace service
does not offer a myspace alternative, then the user will receive a
"service can't access vospace" error when the task is submitted.
To mitigate this, we could modify the file selection dialoge
to check the service metadata and modify the list of available nodes
depending on the service capabilities.
If the user is building a task for a service that cannot access
vospace, then the file selection window can prevent them from
selecting a node in vospace that does not have an alternative
myspace endpoint.
One way to present this to the user would be to 'grey out' nodes that the
selected service cannot access.
In some cases we will not be able to avoid the 'unknown' state shown in the diagram.
If the file selection window does not know which service the task is for,
or it cannot determine the service capabilities, then it will not be able to check if the service can access the node.
The most common case where this will occur is for CEA applications that have more than one service instance.
In theory we could process the metadata for all the services, and base the result the
common set of capabilities. However, this would mean excluding nodes that some of the services can access.
Alternatively, we could enable nodes that at least one service can access, but this would mean that
selecting a particular could change the list of services that the task would be valid for.
For now, we should start with the simple case where we know that the service either can or cannot access the node,
and figure out how to display this information to the user in some way.
If anyone has suggestions on how to handle the more complex case please let me know.
Note that this is not a 'one off' situation caused by the transition
from myspace to vospace.
At the time of writing, several groups within IVOA are interested in implementing
the REST vospace 2.x service interface, but not the SOAP vospace 1.x interface.
This means that VO will always have a mixture of different services,
each capable of accessing a different subset of vospace interfaces.
By learning how to solve this problem now, we will be able to
cope with a mixture of different versions and capabilities
when they become available in the wider VO.
Known unknowns
- How to display this to the user
- Multiple services for a single CEA
- Vospace identifiers embedded in TAP/ADQL queries
5.4.5 (skip) Changes required
5.4.5.1 (skip) File selection context
In order to selectively 'grey out' nodes that the target service can't access,
the file selection window needs access to the metadata
for the service that the task is being created for.
5.4.5.2 (skip) Capability detection
The vospace client package needs to provide a helper that
can interpret the metadata for a CEA/DSA service and a vospace node,
and indicate whether the service will be able to access the node.
5.4.5.3 (skip) URI conversion
The vospace client package needs to provide a tool for converting
a vos: URI into the corresponding ivo: URI if the vospace service
provides an alternative myspace interface.
The vodesktop client needs to be able to trap a CEA/DSA task
before it is sent and check if the target service can access
the vospace nodes referred to in the task parameters.
Note, the vodesktop already has a hook in place to process the
different forms of ivo: identifiers in a task document.
If this hook also has access to the CEA/DSA service metadata, then
the vos: to ivo: translation can be added at this point.
--
DaveMorris - 13 Jun 2008