Gould Belt Feedback
Provided by Jeremy Yates and Niall Gibson, UCL, August 2005
- Main points are from Jeremy and Niall
- Indented comments are added by Anita
- Astrogrid cannot return a result of more than 10,000 rows (or 2,000 rows in some cases). This is a major limitation, as some of the larger catalogues contain in excess of 100 million rows.
- There is no coverage information for any of the catalogues. You do not know whether the area you intend to query is fully covered, partially covered or completely avoided by a particular catalogue. This is particularly troublesome for catalogues such as the INT-WFS, as they are spread out unevenly over the sky.
- The Registry standards cover this in principal but there are several problems with implimentation:
- In many cases the data provider or person/process harvesting the data has not provided the information - Kevin is working on ways to make this easier.
- Even if it is provided, at present you need to read XML - not friendly.
- The INT-WFS is a very difficult case not covered properly by the standards anyway, since it covers a number of significantly large surveys (i.e. tens of square degrees or more) but they are in different combinations of wavebands. Other data sets e.g. the Spitzer archive may be similar.
- There is no information about the size of any of the catalogues, or about their resolution (which is usually linked). This makes it hard to predict the size of query results as one catalogue might return 200 results, whilst another returns in excess of a million.
- In some of the catalogues, the column information shown when the 'Column' button is clicked is not complete. This can be particularly alarming if a catalogue is shown to contain no Uniform Column Descriptors (UCDs).
- Also Registry issues. We need to work with Data Providers to solve this, however in many cases the information is readily available but not conveniently presented. We also need to persuade some data providers not to be so mean with their data; if very low row limits are a result of inadequate servers then they should be aware that it is hindering exploitation of their data (and hence publications, references, fame and fortune...)
- Some catalogues still have results expressed in magnitudes rather than Janskys. This is fairly useless for forms of data analysis such as Spectral Energy Distribution (SED) diagrams.
- There are no facilities for unit conversion. If a catalogue is in magnitudes, and you require Janskys, you have to find the conversion formula and the Zero-Points from an external page. These should be readily available.
- This is a perennial issue; some data providers and users argue that it is impossible to provide generic formulae for conversion to physical units as there are different factors for different sources (depending on their SED, brightness etc.). The counter-argument is that it is possible to convert using a stadard formula to within a few percent at worst, which is usually sufficient for selection purposes or for making an SED covering many decades of wavelength e.g. radio-IR-optical. The VOSpec tool (also via the AVO-Aladin) does offer generic conversion from magnitudes (although not yet from X-ray counts).
- Not all catalogues have tasks with which to query them, neither is it shown which catalogues do have the required task and which don't.
- There is no explanation of how to choose the correct task for a selected catalogue; although in most cases it is fairly obvious, there are exceptions. There is also no explanation of exactly what the tasks do (possibly unnecessary).
- In the near future we hope to offer a one-step catalogue query selection
- There are no facilities for coordinate conversion. Not all the catalogues use the same coordinate system, so a coordinate conversion tool is required in order to perform any cross-matching.
- This is also on the list of tools to be incorporated.
- The Java Runtime Environment (JRE) compatibility is poor, particularly with the Webstart Applications Topcat and Aladin. I feel that they should, at the very least, be perfectly compatible with the latest JRE version.
- This may have been solved by now, and we do attempt to provide full documentation; one problem is that externally provided tools also have compatibility issues outside our direct control.
- Topcat has great trouble handling tables larger than about 100,000 rows (depending, I assume, upon the particular hardware used). This is not usually sufficient considering the size of some star catalogues, especially if cross-matching is to be performed, when one might require multiple catalogues to be open at once.
- This may be a local memory issue, further advice will be sought from the Topcat developer.
- There are no safety measures in Aladin to stop a user from attempting to open too large a table. Two machines attempting this have had major hardware problems which were, perhaps, linked. One machine's hard disk was corrupted.
- The problem may arise when a java application tries to allocate more memory than the host has available. Unfortunately, if you want to handle very large datasets, a lot of memory is needed. Failures should happen gracefully (we are looking at ways to avoid crashes) but it is advisable not to run too many memory intensive applications at once.
- There is no explanation about how cross-matching works, and in particular, how to choose the correct degree of error with which to match catalogues. This is related to the resolution of the catalogues, but also contains a certain amount of trial-and-error.
- There is a help page for the cross-matcher, but at present the user does have to work out for themselves what radii to use. This may be based on actual source size (e.g. if one catalogue is of clusters of galaxies) or on uncertainties, or on probabilities. It would be possible eventually to default to uncertainties, but this would not always be appropriate, not least because there might be additional systematic errors.
- How can you do a data query in my space - e.g. lat and long selection and send the output to a "new" column or a new catalogue?
- At present this can be done using a Script (see e.g. the Colour Cutter parameterized workflow); in the future database queries and joins may be possible in MySpace (VOStore).
--
AnitaRichards - 04 Nov 2005