AstroGrid-2 Technical Scope
Draft Only

Contents:

Introduction

This document outlines the scope of work to be tackled in the AstroGrid-2 (AG2) project. This work is mainly targetted to run from Jan-2004, when the AstroGrid-1 project (AG1) finishes, until Dec-2007, but some work may begin during 2004 as preparation.

[Note by Andy : my proposed schedule is at AndyAG2Schedule]

The AG2 Proposal to PPARC was quite extensive and covered:

  • extensions to the AG1 infrastructure
  • R&D on new technologies
  • funding for data centre uptake of AG
  • other areas such as outreach

The feedback from peer review and PPARC was that the latter two of these areas were outwith the scope of AG2 while the actual level of funding provided meant that the scope of the first two would need to be curtailed from that proposed. A design study was conducted during Feb/Mar 2004 by the Project Manager and Project Scientist into the desired scope of AG2. This document is the result of work by the Project Manager and based on several meetings held with AG partners and other projects; it outlines the technical scope and will ultimately be incorporated into a full design document for the project.

Executive Summary

AstroGrid-2 work will cover:
  • extending the functionality of components developed in AstroGrid-1
  • adding new components to the infrastructrue
  • making use of Grid middleware

The core of AstroGrid is workflow. The astronomer is able to execute complex workflows, involving tasks running in sequence and in parallel; being able to store a workflow design and then retrieve, modify and re-execute it later. In a way, this will allow the evolution of a 'community memory', with the detailed knowledge of one person accessible to all.

To be able to access and use 'knowledge' rather than just data requires much more

Outcome

This document and the associated 'science scope' one is only the first stage in scoping the AstroGrid-2 project. The documents will go through several iterations (what else smile ) during March and will merge into a single project scope document. This will serve as a statement of intent for the AG2 project.

Following the production of the project scope, more detailed work will be carried out to identify firm deliverables and milestones, along the lines of those produced for AG1 (see IterationDeliverablesR01 for example).

Architecture

Although the overall AstroGrid vision has remained much the same over the past two years, the diagrams used to describe it have changed. The last view (see ArchOverview ) attempted to show components and major linkages between them. We have since recognised that linkages are relatively unimportant in an architecture which stresses component independence; to draw real and potential links would make for a very messy diagram. With the increasing movement of the 'grid world' to web services it is also more important that we identify those areas which are strictly VObs infrastructure from those which are more likely to come under the classification of grid middleware.

I have therefore reworked the architecture diagram:

Arch2004c.gif

(Also available, the above diagram without fill colours.)

The following description of the layers and components will serve as the technical scope of the AstroGrid-2 project.

Layers

Producing a layer diagram such as that above is as much about demarcation of effort as about division of functionality. I've tended to apply the rule that infrastructure is what we write and middleware what someone else provides (or should provide in an ideal world). Unfortunately, given the immaturity of grid developments and the grid world's frequent adoption of different technological bases, it will be necessary for AG to provide some middleware until such time as the grid world catches up (and, who knows, maybe our implementations will become the starting point for some of that middleware).

Data

This layer represents all the astronomical data which the VObs will make available to the astronomer. This will include catalogs, observing logs, images, spectra etc. One of the more 'difficult' areas of serving data is that of metadata: each dataset will have different metadata, each data centre will store that metadata (or not) in a different way and relate it to the data differently. All of these need to be viewed through a common standard - an issue we are addressing through the IVOA Data Model working group.

Middleware

At the moment, the only middleware we are using in AG are basic web services. Services such as virtual storage space (disk and database) and security (authentication & authorisation) will be developed as part of the OGSA framework but they are not yet available. AG is currently developing services in this arena (MySpace and inter-component auth/auth security); we expect that these will be replaced by grid middleware as it becomes available.

The three frameworks supporting agents, data mining and visualization are more likely to be seen as part of the VObs infrastructure given that astronomy's needs might be rather specialised but there are efforts looking at incorporating such services into the grid framework and much of the functionality might be generalised to other usage so it makes sense to consider them as middleware.

Virtual Observatory Infrastructure

The components listed here are what we, in AstroGrid, are calling infrastructure: that which is necessary to allow people to build VObs applications. Again, the distinction with middleware is blurred; workflow and community components might one day be replaced by grid middleware but there seems less commonality of approach amongst the projects developing these types of component so it is safe to imagine them as distinct VObs infrastructure.

Tools

The astronomy community has many tools at its disposal but most are developed for use on a single person's workstation. Applications which make use of the VObs infrastructure will take time to appear and tools which can be executed on remote servers as part of complex workflows may require specialised development. AG has developed a couple of such applications and has created a generic wrapper for command line tools. We will continue to devote some effort to tools development but it will remain a small part of the project and driven solely by the need to prove the infrastructure.

Astronomer Interface

This is the typical presentation layer. AG1 has developed the Portal component. This provides browser-based access to the full raft of VObs functionality. Some functions, however, may not be suitable for this approach. Detailed graphics work is easier to perform using a dedicated application. And often a user will want to download a set of data and perform analyses and manipulations offline. For this reason, AG2 will develop an astronomer's workbench, providing access to VObs services and dedicated applications which can be run from a workstation. We will also develop a command line interface (CLI) which also gives access to VObs services from a single command line: this is a lower priority and make not make the final cut of functionality.

Components

For ease of reference I have divided this list into those components which AG1 has already delivered and will continue to develop in AG1, and those components relatively new to the project.

Existing

Under each of these components I have identified, in a few words, the broad functionality of the component. Then I have described what will be done in AG2 to improve on that functionality or to adapt the component to work with the new ones.

Portal

By the end of AG1, the Portal will be largely complete. It is, as with most of the AG technologies, more a framework than an application. Built using Apache Cocoon, it delivers a highly customisable browser-based interface which will provide the capability for:
  • applications to be fully integrated into the Portal, or
  • tools to specify a simple forms-based interface for the creation of workflow inputs.

In AG2 it will be necessary to integrate the new components into the Portal where they must present a user interface and, as we create or adapt tools for science use, to create the appropriate forms definitions. We will also work with third party applications developers to help them fit their tools to the Portal.

Warning: if the IVOA decides to adopt a different standard for fitting applications to portals and for presenting tools' interfaces then AG will need to change the Portal accordingly. This might even mean dropping the existing Portal and adopting a different technology (it is more likely that the standard would be some commonly agreed one, in which case we will work with Cocoon developers to make Cocoon compliant with that standard, eg JSR 168).

Workflow

The Workflow component actually consists of two completely separate parts:
  • a user interface for workflow construction
  • the Job Execution Service (JES)

AG1 will provide the ability to create a complex workflow with parallel flows, sequences, branching and conditional execution. The description of a workflow will be constructed using a subset of the standard BPEL specification. The JES will execute, or cause to be executed, each task in a workflow. This might be done by proxy so that the results of the task can be returned remotely. Where the JES or proxy is co-located with a MySpace module, the results can be stored in the user's own virtual storage area.

In AG2, we will incorporate task and query estimation techniques and various aspects of Resource Discovery (see below). Learning from work being done on agents technologies by AG partners, we will add the ability of tasks to respond to specified events and to invoke workflows based on criteria defined by users (see Agent Framework below).

Community

AG1 will deliver the functionality to allow the management of communities: creating accounts (persons) and groups, adding accounts to groups and removing them, assigning privileges to groups etc. All of this is much along the lines of that specified in the Globus CAS technology.

We do not anticipate significant changes to this component in AG2.

Registry

The Registry is the centre of the virtual observatory. All resources are listed in it along with metadata that allows those resources to be matched to the task a user wishes to perform. AG1 will deliver basic Registry functionality, allowing resources to be added or removed, updated and harvested from other registries. Queries can be made against the registry using either an XQuery based method or an IVOA standard query interface.

In AG2, work will concentrate on extending the scope of the registry: adding new types of resource and adapting the registry schema to allow these resources. The schema for existing resources will be extended to allow more extensive metadata to be held. In particular, we will develop schemas for Solar and STP data services (this work will start during AG1).

One area of concern to data centres is how the metadata that they currently hold and which is held in forms unique to each dataset can be made accessible to the VObs. Although I currently feel that producing a generic solution to this problem may well cost more than many specific ones, it is an area we will wish to investigate in conjunction with staff in the data centres.

Dataset Access

The Data Centre component of AG1 provides access to data in a way that makes it easy to adapt to the different storage mechanisms employed in astronomy. The component will accept a query in the IVOA standard ADQL form and make it available to plug-ins that are written specifically for different storage mechanisms (eg different relational database systems). During AG1, this will be extended to provide access to image and spectral data, using the IVOA SIAP and SSAP standards and to various types of Solar and STP data (eg CDF format data). Some of this work will extend into AG2.

During AG2, we will continue the above work, adding new access mechanisms as necessary. Once the IVOA DM group has produced a coherent model of metadata that spans different types of astronomical data, we will also add a query mechanism that utilises this model, probably using XQuery. Extending the query mechanism into time-series data (especially important for solar system data) may require further changes to the various query mechanisms.

A more significant advance during AG2 will be the development of data access across multiple databases. It is still uncertain whether this will be implemented at the workflow level (splitting a multi-db query into separate single-db queries with merge-and-select tasks intervening), at some framework level (eg the Data Mining framework below which will carry out the separate queries, perhaps using specialised resources) or at the data centre level (where a master database serves queries out to slave databases and manages the results).

MySpace

The concept of virtual storage space, allowing a user to store items of data, whether files or database tables, at remote locations without having to have an account at those locations, and to access those items without needing to specify any physical address, is one promised by grid technologies. AG1 provides an implementation of this concept called MySpace, currently making virtual file space available to users with appropriately configured Community accounts (implementation of MySpace does rely on reciprocal arrangements between computer centres).

During AG2, we will extend this concept to the storage of database tables (may begin the work during the later iterations of AG1) and access to files on the user's workstation, will implement user-specifiable access permissioning and will implement quota (time- and space-based) booking, allocation and monitoring. Many of these features will require additions to the Community module: these will be implemented as optional. Some means of defining and checking reciprocal space arrangements will also be needed.

Auth/Auth Security

Proper authentication and authorisation processes are another aspect expected to be delivered via the grid middleware. Much of this, however, will be implemented via personal digital certificates. In AG, we did not anticipate many users having (or wishing to have) these in the early stages of VObs implementation. Our solution to the issue of authentication in AG1 has been to allow simple username/password verification of identity via the portal but to implement inter-component communication using server-based certificates. For authorisation, data centres can associate access policies with their datasets and communities will manage membership of the groups which have been granted access rights. As with the Comunity component, this is consistent with the Globus CAS technology.

This solution is likely to remain in place for the medium term but we will be tracking activity in the Web Services and Grid standards arena.

New

The specification of these new components is necessarily simplistic at this stage. More details will be worked out over the coming months in discussion with partners, team members and external projects.

Workbench

The Virtual Observatory Workbench (VOW) is more than just another client-based application for visualizing data or conducting data analyses. VOW will provide a complete framework within which applications can run and interoperate, allowing any and all VObs data and services to be accessed. In approach, it is similar to the Starlink ADAM software environment but built with modern technologies and incorporating the VObs approach to independent web services.

I would hope that we can adapt an existing technology, such as Eclipse (see, for example, the description of its Rich Client Platform), to provide the basic tool framework but there may be existing tools in the astronomy community that will serve as a base. More than providing a simple and common interface for applications, VOW will provide a common object model for any application allowing inter-application communication. We will work with tools providers to ensure adoption of this model so that a user can swap one application for another without losing interoperability with other applications.

AG2 will develop the VOW and will provide interfaces to the server-based VObs data, tools and applications. We will document the core API and how VObs resources can be accessed. We will provide some basic functionality by adapting existing tools or providing custom-built ones but will expect most functionality to be provided by the tools providers themselves.

CLI

It is unlikely that we will create a completely new Command Line Interface (CLI) tool for running VObs services but will probably provide appropriate extensions to existing, popular interfaces. The direction of this part of the project will be determined after discussion with the Science Advisory Group (AGSAG).

Resource Discovery

This incorporates many aspects but the two of most immediate import are:
  • recognising that several resources might deliver the same service and selecting the optimum;
  • defining service metadata in such a way that the above selection can be made.

The first aspect encompasses the identification of criteria which matter in selecting a resource, such as the current state of a service and its host machine or the state of co-located resources, getting up to date information on those criteria (and testing that the information is indeed up to date and valid) and then making that information available to the JES.

The second aspect requires the investigation of metadata and its relationships: a field termed Ontology in the intelligent systems field.

While much generic work is being done in these and related fields, astronomy will require domain-specific solutions given the particular nature of its data and metadata. For example, when is a dataset a mirror of another dataset, who is responsible for verifying that datasets are equivalent, how to tag data as appropriate for one type of analysis and not for another. While these problems are not unique to astronomy, their investigation in the context of astronomy data is essential for us to proceed beyond the user merely picking a dataset from those listed in the Registry.

VObs Support Services

This is less a single component than a catch-all for those tools which AG will need to provide in order for common VObs services to interoperate. As an example, we will provide data conversion and federation tools which will allow workflows that stitch together tasks whose outputs and inputs are in incompatible formats. Many of these tools may not already exist since the VObs is developing novel methods of application interoperability, but where they do exist we will work with the tools providers to adapt them to VObs compatibility.

Agent Framework

The concept of agents is a difficult one: viewed from one angle, it looks like science fiction (or fantasy smile ) where programs make decisions, run other programs and might persist for years without intervention from their creator; viewed from another angle, they are simply programs designed to persist on a server. The truth (from the AG2 perspective) is somewhere in between but nearer the latter than the former.

The framework we provide will, in the first instance, be relatively simple. It will provide the ability for a VObs service to be deployed such that it persists on a server, can respond to events (specified by a combination of the type of service and associated configuration file, so a service designed to respond to changes in data will have a config file which specifies the data location and type of change to watch for) and can invoke a workflow managed at another location. This will be done in conjunction with the eStar and associated projects to ensure that our framework is consistent with the needs of these projects.

A user interface will be added to the Portal to allow a user to configure such an agent service by selecting the type of service, where it should be installed and its configuration. In the short term, it is likely that the framework will be installed at a location with basic service types and the user will only need to configure the events to be responded to and what action to take.

Other aspects of this framework will be provided by modifications to the Workflow component. We will adapt some aspects of agent technology for intelligently determining tasks to be executed and add these to the Workflow component.

Data Mining Framework

In the short term, the data mining framework will simply make some data analysis tools available on servers co-located with large datasets. The first such tool, and one which will allow us to trial several techniques, will be one which provides an in-memory manipulation of kd-tree representations of a large astronomical data-subset.

This will require that we are able to either co-locate the mining framework with several large datasets along with intelligent workflow components or stream the results of a query from the datasaet to the location of the framework. We will require that the kd-tree service can either (or both) remember state by persisting to disk between calls or can put calls from new users on 'hold' while it is engaged with a 'transaction'. We will provide function code to manipulate the kd-tree representation, in the first instance code to perform correlations. In the longer term, the framework will provide a common API and object model for tools writers.

Other data mining and analysis tools will be developed or adapted in consultation with the AGSAG.

Visualization Framework

The visualization framework will be developed in close conjunction with the data mining framework. In the short term, we will develop:
  • a modified, server-based version of a 3D tool to provide visualization of multi-dimensional data (will investigate open source options)
    • linked to the data mining framework to enable the visualizations to drive further kd-tree based range queries
  • a service to create jpeg images from image files or cut-outs
  • a portal-based tool which provides functionality similar to Aladin (this will be developed in small stages, testing the feasibility of such a tool)

In the longer term we would wish to develop server-based, multi-dimensional visualization tools which respond dynamically to user input but this will depend on identifying an existing tool or set of technologies that we can adapt without the need for extensive research into little-understood techniques.

Links

AG2 will develop close links to other projects. Many of the PPARC projects in the last round of e-Science funding will be expected to provide applications for AstroGrid or to make use of its VObs infrastructure for accessing data and services.

-- TonyLinde - 01 Mar 2004

Topic revision: r15 - 2004-04-07 - 17:08:23 - TonyLinde
 
AstroGrid Service Click here for the
AstroGrid Service Web
This is the AstroGrid
Development Wiki

This site is powered by the TWiki collaboration platformCopyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback