Objectives
To establish what is required of the DBMS technology which will under-pin the virtual observatory.
To evaluate a range of DBMS solutions for their suitability.
Inputs
Experience of relational and OO DBMS at
AstroGrid sites and others.
Overall science requirements and use-cases from WP-A1.
The existing structure of a range of data archive sites in the UK and elsewhere.
Existing ad-hoc standards for astronomical metadata and for astronomical server inter-working such as FITS, ASU, and GLU.
Software systems available from other disciplines.
Inputs from other work packages.
Tasks
Given the overall science requirements from WP-A1, establish the detailed requirements for a DBMS in terms of data storage, management, and querying, and external interfaces, taking account of the need to provide scalable data mining facilities on multi-processor systems.
Develop a few simple benchmark problems from use-cases based on existing astronomical datasets, such as large catalogues, for these evaluations.
At the data storage level: study the options for storage of astronomical data in various formats including FITS and XML. The problems to be studied include efficient access to binary files, sky tessellation, multi-dimensional indexing, the preservation of legacy data, and the generation of homogeneous metadata.
Participate in international efforts, with our partners in the
AVO and NVO projects, to define standards for astronomical metadata. This is a joint activity with WP-A2 and WP-A5.
Evaluate a number of DBMS of various types, including relational, object-oriented and object-relational. Examine and where appropriate evaluate GIS, statistical packages, data warehouse solutions, and XML-based DBMS.
Evaluate the INFEO system developed for searching distributed Earth Observation catalogues and the Isite tool used in the NERC Metadata Project and supported by NASA with a view to determining if parts of them are suitable for use in astronomy.
Evaluate a short-list of solutions in parallel hardware environments such as SMP and Beowulf clusters. Problems to be tackled include federation of datasets over the wide area network, the suitability of SQL or OQL for astronomical queries, and the handling of metadata.
Examine and evaluate middle-ware solutions for the layer between the astronomically oriented user interface and the standards-based DBMS. This will be done in collaboration with WP-A2.
Solutions based on SOAP and Java should be included.
Outputs
Produce a document listing the virtual observatory database requirements.
Report on the options, benchmark results, and recommendations for DBMS technology to be installed at
AstroGrid sites in Phase-B. It may be appropriate to have separate recommendations for solutions suitable for retro-fitting to existing or "legacy" sites and for those more suitable for "green-field" sites such as VISTA.
Report on software developments required during Phase-B, especially in areas such as data mining and data exploration (e.g. correlation, population modelling, outlier discovery). We have good links with statisticians and computer science groups in Belfast, Edinburgh, Pittsburgh, and Penn State and elsewhere which should facilitate collaborative developments in this area.