Requirements
for the
Astronomical Catalogue Extractor
(ACE)
Martin Hill
mch@roe.ac.uk
The Astronomical Catalogue Extractor is the Astrogrid team's contribution to the
AVO Demonstrator (AVODemo). It is essentially a SExtractor application published as a web service.
Introduction
Background
The Astrophysical Virtual Observatory (
AVO) is an organisation studying the development of a distributed astrophysics data set and processing 'grid'. The
AVO team is made up of several interested parties, including astronomers, ESO, CDS, etc.
The
AVO Demonstrator is a technology demonstrator being prepared for January. It's main purpose is to develop and test technologies that may be used to assemble the
AVO. It's main design driver is to carry out scientific analysis of Spectral Distributions for two surveys; the Deep Fields (Hubble North and Chandra South) and the Magellanic Clouds.
This document describes the work to be carried out by the
AstroGrid team, as its contribution to the
AVO Demonstrator.
Data Sources
The data will be
GOODS (The Great Observatories Origins Deep Sky) data sets, from the IR, X-Ray and optical wave bands. Some X-Ray data may also be provided from Chandra (TBD)
A 'cutout' (image cropping) service that will provide a square image for a requested area of the sky. This image will be in MEF (Multi-FITS) format, including the measured image and corresponding weight maps. This image may be mosaiced. See
http://www.eso.org/science/goods/brickwall.html
The image file itself is unlikely to contain 'metadata' about, for example, flux scalings or frequency width.
Where will this information come from?
Catalogue Extraction
This is the process of analysing pixel images to locate objects. There are many tools available for this purpose, some of them discipline specific, such as SExtractor for optical and AIPS for radio. Typically the extraction process is 'tweaked' under direct control of the astronomer, who will specify various parameters describing exactly how the tool is to search for objects. Results are typically given as two dimensional tables (rows = objects, columns = information about the objects) in FITS or ASCII formats.
Supporting Documents
See
AVODemo page for links to Glossary, References, etc.
Understanding the Requirements
Drivers
Since we have no formal requirements, we look briefly here at the 'drivers' for the demonstrator; that is to say, the reasons for making one and the results expectated. This gives us a context for our requirements.
The demonstrator is a 'first iteration'; it may be that it requires considerable rework after the January deadline. Obviously, the less work required afterwards the better, as we will be able to work on other things, but tasks are being separated between "For the Demo" and "For Later", the latter being probably for Autumn 2003.
Science Drivers
The demonstator is expected to provide real science results, as follows:
- Deep Field studies for cosmology. (ESO keen on Chandra Deep Field South)
- Magellanic Clouds for galactic astronomy
- Unify collaboration issues with the different wavelength disciplines
NB, the Demo is to provide tools to carry out this science, not actually carry out the science. This is important to prevent 'scooping' other real scientists involved in these areas.
Technical Drivers
The demonstrator is to trial technologies discussed by the
AVO teams, as follows:
- Developing the UI and how to interact with other services
- Develop a remote service (ACE) with appropriate protocols
- Concentrate on AVO/Grid issues rather than writing new applications (for this reason, existing scientific tools should be used if possible)
General Drivers
Encourage the
AVO teams to focus on the practical aspects of developing
AVO services.
The AVO Demonstrator
See Francoise's work breakdown:
http://wiki.astrogrid.org/pub/Astrogrid/AVODemo/demobreakdown_V03.doc
The
AstroGrid team will provide a web service that takes an image and extraction parameters, and returns a table (catalogue) of objects found in that image.
Where possible, the team will design and implement these products in a 'griddy' fashion, in order to test grid technologies and provide a first-stage framework for building further grid-like VO functions.
Extracting object catalogues from images
SExtractor is a commonly used tool for analysing a pixel image and producing a table of the objects it has found in that image. Many (approx 100) parameters can be used to define how the extraction is carried out; for example, how 'definite' the object must be compared to the background noise.
Input is via two ASCII files, one giving parameters and another the columns to be provided in the result set. Parameters can also be given as part of the command line.
Output formats can be selected by setting certain input parameter(s), and are essentially FITS or ASCII variations. Rough calculation gives n x 10k objects for a typical GOODS image for a typical extraction (although we should be dealing only with small cropped 'cutout' images)
Assumptions
The following assumptions have been made:
- ACE will carry out no image processing beyond that done by SExtractor. For example, chi-squared images will be provided by ESO, and only a single image (along with its weighting and perhaps a related chi-squared image) will be accepted per service instance.
- Visualisers are not part of this work package
- Source data will be of the Hubble Deep Space images, or the magellanic clouds, however the design should allow for any area of the skies.
- The web service will not carry out any extra validation on the input parameters; if SExtractor breaks while running, a simple error message will be returned to the user.
Requirements
Functional Requirements
This section details the functions that the
AstroGrid will provide, and the requirements of each.
Test Harness/User Interface
The UI will provide a mechanism for the user to:
- Specify any and all of the parameters to be sent to the service, including file (eg image) locations.
- 'Post' the parameters to ACE
- Display the results in a human-readable form.
Parameter Form
A form of some sort will be available to the Aladin user to enter parameters for the extraction.
This application will take an image and a parameter set, and apply the SExtractor application to them, returning the resultant object catalogue.
It will include the following functions:
Grid Service
In theory, the SExtractor process may not respond immediately, and so should be considered a stateful
AstroGrid Service. However, implementing full Grid Service functionality is beyond the scope of the demonstrator, and only the following grid-like functions will be provided (TBD!):
- The service must provide metadata when requested, essentially as a WSDL file and a schema for the "input" message:
- Parameters Required
- Value Units
- Value Types
- The client must be able to terminate a service instance. (Perhaps the service may stop the application if the http connection is closed TBD)
For the moment, the service will be considered 'stateless'.
Starting an Extraction
The "RunExtraction" message will be SOAP, with parameters listed in 'document' style. A schema will be required.
This message will include the following:
- Individual Parameter values
- Reference (URL) to measured image
- (Optional) Reference to 'Master' image (eg chi-squared)
- (Optional) Reference to weight map
- (Optional) Reference to template of parameters, either:
- References to SExtractor configuration files (parameter and output column list)
- Reference to 'template' XML-formatted list of parameters
- Other metadata to be included in the results (eg flux scaling, frequency width) that may not be included in the FITS image.
SExtractor Interface
SExtractor takes a few tens of parameters from an ASCII configration file. More can be specified on the command line (which override the configuration file).
The configuration file may contain environment variables and file paths which SExtractor can resolve; this must be checked by the wrapper to avoid unresolvable files and values.
The "standard out" and "standard error" streams will be monitored for application state, and information to be returned to the user if it fails.
Return output
A successful extraction will return output to the client as an an XML document describing:
- the results - including the number of records found - and where the results can be found
- the parameters used in the extraction, for confirmation.
- other descriptive information, such as the version of SExtractor used.
The actual catalogue results will be made available via FTP, in VOTable format. Whether this will be pure-XML or wrapped FITS is TBC.
Lifetime of results? Do we not care from the point of view of the demo? - MCH
We need to be able to return errors if there were problems with the input (possibly as Exceptions, see below), and also return a suitable but non-error message if no objects were found
Exceptions
ACE must be able to return messages to the client in the case of SExtractor failing, illegal URLs given, or invalid XML formats. Suitable user-friendly messages should be included...
Data Converter
Whatever the output of
SExtractor, the data should be convertable to a standard format that can be used by ESO. This may be pure-XML VOTable or VOTable-wrapped FITS (TBD).
ACE will carry out that conversion.
Defaults
On being sent a "RequestDefaults" message, the service will return an XML document, containing default values for the parameters. The format will be suitable for cutting-and-pasting into the XML template parameter configuration file.
Access Requirements
Remote access to the service will demonstrate Astrogrid-like activity, but deployment may be to a stand-alone portable for demonstration purposes (TBC). Therefore ACE must work both when the data and client is remote from the service, and when co-located.
Therefore ACE must be able to retrieve files by 'http', 'ftp' and 'file' protocols.
The should be some way of resolving http/ftp URLs to a local reference when ACE is co-located with the data, to improve performance. (Low Priority?)
As this a stateless service, all files will be recopied to the server when a "Run Extraction" message is received; future implementations might check the remote file's date/timestamp and only copy if required.
Referring to individual FITS files within MEFs could be done by adding a [n] to the URI, where n is the file within the MEF.
(Low Priority - depends on whether images are MEFs and SExtractor can cope)
ACE will need to provide an FTP server to access results.
Security Requirements
For the demonstration, the service will be publically available and there are no security requirements.
Performance Requirements
For the purposes of the demonstration, the data sets can be kept small and the processing times brief by carrying out tests ahead of time to find suitable user parameters.
Moving data should be avoided where possible. For the demonstration, the application may be co-located on one machine so that no network traffic is required (ie, internal busses will be used) but the protocol must allow them to be run across a network.
--
MartinHill - 28 Aug 2002