r10 - 07 Oct 2003 - 09:29:12 - MartinHillYou are here: TWiki >  Astrogrid Web  >  DocStore > DemoProjects > AVODemo > AVODemoRequirements

Requirements
for the
Astronomical Catalogue Extractor
(ACE)

Martin Hill
mch@roe.ac.uk

The Astronomical Catalogue Extractor is the Astrogrid team's contribution to the AVO Demonstrator (AVODemo). It is essentially a SExtractor application published as a web service.

Introduction

Background

The Astrophysical Virtual Observatory (AVO) is an organisation studying the development of a distributed astrophysics data set and processing 'grid'. The AVO team is made up of several interested parties, including astronomers, ESO, CDS, etc.

The AVO Demonstrator is a technology demonstrator being prepared for January. It's main purpose is to develop and test technologies that may be used to assemble the AVO. It's main design driver is to carry out scientific analysis of Spectral Distributions for two surveys; the Deep Fields (Hubble North and Chandra South) and the Magellanic Clouds.

This document describes the work to be carried out by the AstroGrid team, as its contribution to the AVO Demonstrator.

Data Sources

The data will be GOODS (The Great Observatories Origins Deep Sky) data sets, from the IR, X-Ray and optical wave bands. Some X-Ray data may also be provided from Chandra (TBD)

A 'cutout' (image cropping) service that will provide a square image for a requested area of the sky. This image will be in MEF (Multi-FITS) format, including the measured image and corresponding weight maps. This image may be mosaiced. See http://www.eso.org/science/goods/brickwall.html

The image file itself is unlikely to contain 'metadata' about, for example, flux scalings or frequency width. Where will this information come from?

Catalogue Extraction

This is the process of analysing pixel images to locate objects. There are many tools available for this purpose, some of them discipline specific, such as SExtractor for optical and AIPS for radio. Typically the extraction process is 'tweaked' under direct control of the astronomer, who will specify various parameters describing exactly how the tool is to search for objects. Results are typically given as two dimensional tables (rows = objects, columns = information about the objects) in FITS or ASCII formats.

Supporting Documents

See AVODemo page for links to Glossary, References, etc.

Understanding the Requirements

Drivers

Since we have no formal requirements, we look briefly here at the 'drivers' for the demonstrator; that is to say, the reasons for making one and the results expectated. This gives us a context for our requirements.

The demonstrator is a 'first iteration'; it may be that it requires considerable rework after the January deadline. Obviously, the less work required afterwards the better, as we will be able to work on other things, but tasks are being separated between "For the Demo" and "For Later", the latter being probably for Autumn 2003.

Science Drivers

The demonstator is expected to provide real science results, as follows:
  • Deep Field studies for cosmology. (ESO keen on Chandra Deep Field South)
  • Magellanic Clouds for galactic astronomy
  • Unify collaboration issues with the different wavelength disciplines

NB, the Demo is to provide tools to carry out this science, not actually carry out the science. This is important to prevent 'scooping' other real scientists involved in these areas.

Technical Drivers

The demonstrator is to trial technologies discussed by the AVO teams, as follows:
  • Developing the UI and how to interact with other services
  • Develop a remote service (ACE) with appropriate protocols
  • Concentrate on AVO/Grid issues rather than writing new applications (for this reason, existing scientific tools should be used if possible)

General Drivers

Encourage the AVO teams to focus on the practical aspects of developing AVO services.

The AVO Demonstrator

See Francoise's work breakdown: http://wiki.astrogrid.org/pub/Astrogrid/AVODemo/demobreakdown_V03.doc

AstroGrid Contribution

The AstroGrid team will provide a web service that takes an image and extraction parameters, and returns a table (catalogue) of objects found in that image.

Where possible, the team will design and implement these products in a 'griddy' fashion, in order to test grid technologies and provide a first-stage framework for building further grid-like VO functions.

Extracting object catalogues from images

SExtractor is a commonly used tool for analysing a pixel image and producing a table of the objects it has found in that image. Many (approx 100) parameters can be used to define how the extraction is carried out; for example, how 'definite' the object must be compared to the background noise.

Input is via two ASCII files, one giving parameters and another the columns to be provided in the result set. Parameters can also be given as part of the command line.

Output formats can be selected by setting certain input parameter(s), and are essentially FITS or ASCII variations. Rough calculation gives n x 10k objects for a typical GOODS image for a typical extraction (although we should be dealing only with small cropped 'cutout' images)

Assumptions

The following assumptions have been made:
  • ACE will carry out no image processing beyond that done by SExtractor. For example, chi-squared images will be provided by ESO, and only a single image (along with its weighting and perhaps a related chi-squared image) will be accepted per service instance.
  • Visualisers are not part of this work package
  • Source data will be of the Hubble Deep Space images, or the magellanic clouds, however the design should allow for any area of the skies.
  • The web service will not carry out any extra validation on the input parameters; if SExtractor breaks while running, a simple error message will be returned to the user.

Requirements

Functional Requirements

This section details the functions that the AstroGrid will provide, and the requirements of each.

Test Harness/User Interface

The UI will provide a mechanism for the user to:
  • Specify any and all of the parameters to be sent to the service, including file (eg image) locations.
  • 'Post' the parameters to ACE
  • Display the results in a human-readable form.

Parameter Form

A form of some sort will be available to the Aladin user to enter parameters for the extraction.

SExtractor Grid Service

This application will take an image and a parameter set, and apply the SExtractor application to them, returning the resultant object catalogue.

It will include the following functions:

Grid Service

In theory, the SExtractor process may not respond immediately, and so should be considered a stateful AstroGrid Service. However, implementing full Grid Service functionality is beyond the scope of the demonstrator, and only the following grid-like functions will be provided (TBD!):
  • The service must provide metadata when requested, essentially as a WSDL file and a schema for the "input" message:
    • Parameters Required
    • Value Units
    • Value Types
  • The client must be able to terminate a service instance. (Perhaps the service may stop the application if the http connection is closed TBD)

For the moment, the service will be considered 'stateless'.

Starting an Extraction

The "RunExtraction" message will be SOAP, with parameters listed in 'document' style. A schema will be required.

This message will include the following:

  • Individual Parameter values
  • Reference (URL) to measured image
  • (Optional) Reference to 'Master' image (eg chi-squared)
  • (Optional) Reference to weight map
  • (Optional) Reference to template of parameters, either:
    • References to SExtractor configuration files (parameter and output column list)
    • Reference to 'template' XML-formatted list of parameters
  • Other metadata to be included in the results (eg flux scaling, frequency width) that may not be included in the FITS image.

SExtractor Interface

SExtractor takes a few tens of parameters from an ASCII configration file. More can be specified on the command line (which override the configuration file).

The configuration file may contain environment variables and file paths which SExtractor can resolve; this must be checked by the wrapper to avoid unresolvable files and values.

The "standard out" and "standard error" streams will be monitored for application state, and information to be returned to the user if it fails.

Return output

A successful extraction will return output to the client as an an XML document describing:
  • the results - including the number of records found - and where the results can be found
  • the parameters used in the extraction, for confirmation.
  • other descriptive information, such as the version of SExtractor used.

The actual catalogue results will be made available via FTP, in VOTable format. Whether this will be pure-XML or wrapped FITS is TBC.

Lifetime of results? Do we not care from the point of view of the demo? - MCH

We need to be able to return errors if there were problems with the input (possibly as Exceptions, see below), and also return a suitable but non-error message if no objects were found

Exceptions

ACE must be able to return messages to the client in the case of SExtractor failing, illegal URLs given, or invalid XML formats. Suitable user-friendly messages should be included...

Data Converter

Whatever the output of SExtractor, the data should be convertable to a standard format that can be used by ESO. This may be pure-XML VOTable or VOTable-wrapped FITS (TBD).

ACE will carry out that conversion.

Defaults

On being sent a "RequestDefaults" message, the service will return an XML document, containing default values for the parameters. The format will be suitable for cutting-and-pasting into the XML template parameter configuration file.

Access Requirements

Remote access to the service will demonstrate Astrogrid-like activity, but deployment may be to a stand-alone portable for demonstration purposes (TBC). Therefore ACE must work both when the data and client is remote from the service, and when co-located.

Therefore ACE must be able to retrieve files by 'http', 'ftp' and 'file' protocols.

The should be some way of resolving http/ftp URLs to a local reference when ACE is co-located with the data, to improve performance. (Low Priority?)

As this a stateless service, all files will be recopied to the server when a "Run Extraction" message is received; future implementations might check the remote file's date/timestamp and only copy if required.

Referring to individual FITS files within MEFs could be done by adding a [n] to the URI, where n is the file within the MEF. (Low Priority - depends on whether images are MEFs and SExtractor can cope)

ACE will need to provide an FTP server to access results.

Security Requirements

For the demonstration, the service will be publically available and there are no security requirements.

Performance Requirements

For the purposes of the demonstration, the data sets can be kept small and the processing times brief by carrying out tests ahead of time to find suitable user parameters.

Moving data should be avoided where possible. For the demonstration, the application may be co-located on one machine so that no network traffic is required (ie, internal busses will be used) but the protocol must allow them to be run across a network.

-- MartinHill - 28 Aug 2002

Edit | Attach | Printable | Raw View | Backlinks: Web, All Webs | History: r10 < r9 < r8 < r7 < r6 | More topic actions
 
AstroGrid Service Click here for the
AstroGrid Service Web
This is the AstroGrid
Development Wiki
This site is powered by the TWiki collaboration platformCopyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback