Revised Observation Data Model

I have started by taking the Observation V0.2.pdf (Obs V0.2) and sticking to it as closely as possible whilst trying to accomodate the general idea of heirachical data as modelled by IDHA and to relate to models or metadta for specific data collections e.g. from Reports from data providers at InteropDataModel and Peter Lamb's draft for interferometry NOTE-RDM-2003-10-16.pdf (RDM). We should als check for consistency with the STC model and daa models which have been developed with greater resources e.g. NRAO, ALMA overview (by Heiko Sommer)
NB Different Figs. may have the same No in different documents.

My intention is to develop a skeleton model which contains the minimum detail to be useful for the IVOA for access to data and selection of available tools to handle data. At present the IVOA model is not likely to be used for making observations or reducing data, but for accessing reduced data or tools. The data models for individual resources will have additional (different) details but should plug in to the classes in the Observation model.

This is a revised Observation model: ObsDMamsr_0.11.png
Note that my UML is technically incorrect as I don't yet have Together so I hope syntax errors won't be too distracting (e.g. I have not distinguished between agregation and composition), however please point out if I have relationships of completely the wrong sort, going the wrong way or with 1 to many making nonsense.

Main model Classes in alphabetical order:

Class Reference Explanation/instances
AnalysisMethod   How MeasuredQuantities are extracted from ObsData e.g. SExtractor settings, time series analysis package.
AuthorisationFilter Community Any information required for access to restricted data, to supply to a Community model
Characterisation Obs V0.2 Fig.4 Data properties like coverage, should be STC consistent. MeasuredQuantities may have different Characterisation from the parent Observation, e.g. spatial coverage in objects/deg^2, noise-based error in addition to systematic astrometric error.
Curation Obs V0.2 Fig.4, RMv1.0 Details needed by tools/registry e.g. data type (FITS MEF, VOTable etc.)
DataType IDHA, RMv1.0, Quantity Similar to IDHA Coding and more detailed than RMv1.0, e.g. flavours of FITS like MEFs; what sort of data e.g. image, datacube, PV image, visibilities (a short-cut for some of the details in Characterisation, to supply to Registry and Quantity models).
Identity RMv1.0 Details needed by Registry
MeasuredQuantities Obs V0.2 Fig.5 Extracted parameters - plural like data
ObsConf Obs V0.2 Fig.4, RDM Fig.5 Instrumental characteristics fixed for the observation e.g. receiver, ground-based telescope location, array configuration. May not include synthesised beam and other properties which are affected by processing.
ObsData Obs V0.2 Fig.5 Data products (visibility sets, images etc.) - whatever the provider supplies, raw or (ideally) science-ready processed data
ObservatoryLog INT-WFS model Catalogue of observations (Obs V0.2 Section 3 Data Collection/Archive) - may belong elsewhere?
Observation Obs V0.2 Fig.5, IDHA A set of data which shares a Provenance.
Processing Obs V0.2 Fig.4, RDM Fig.5 Called variants on Pipeline in IDHA and Obs V0.2 This could include correlation (RDM Fig.5 SynthBeamConf), calibration, resampling etc.. One set of observations can have different Processing parameters (e.g. different weightings), possibly established on-the-fly, giving ObsData with different Characterisation. Can this be correctly associated using Versioning?
Project RDM Fig.4 Proposal details including project ID (sometimes vital for identity if position can't be used).
Provenance Obs V0.2 Fig.4 A telescope or a detector or a simulator or a group of these.
Service RMv1.0 Details needed by registry for data access e.g. access protocol
SourceSchedule RDM Fig.6, Obs V0.2 Fig.4 Outline of observing strategy (renamed from RDM Observation to avoid confusion.). Includes Obs V0.2 Fig.4 Target as a sub-class.

The gray boxes show classes which exchange information with other models, e.g. the Registry (RM v1.0 or whatever is current), the Quantity model or a Community model. I think that it is better to repeat overlapping metadata at the minimum level required for human intelligibility, for example the Quantity model may define how many axes a data set has but the Observation model should know if it has a flat image, a data-cube, an MEF (multi-extension FITS), a 7-D visibility data set etc.

Some classes should perhaps be connected differently e.g. MeasuredQuantities directly to Observation, I am happy for experts to re-arrange - as long as the correct association with a particular Processing or AnalysisMethod is connected with the correct data product via Versioning or otherwise

The coloured boxes can be expanded into sub-classes in a data-set specific way, ObsDMamsr_0.11eg.png contains examples (can be printed legibly on A4). These can contain whatever classes suite the data provider as long as information passed to the VO is standard. For example, the Project ID is often needed, but the proposal author details are optional for public domain data unless they are the same as the Identity details. The metadata from any local model which the VO doesn't want to interpret should be available as a whole if needed. For example, I want a radio light curve for a variable object in the AVO demo region NGC1333 but only time-averaged images are published. If a hypothetical Interferometer Data Centre can take the visibilities and extract the light curve then all the VO needs to know is some details from Characterisation and the identity of Provenance. The Interfometry Data Centre (or RadioNet) can worry about the SourceSchedule etc. On the other hand, if the user will have to extract the light curve themselves, then the VO can only supply the visibility data plus a literal transcription of the detailed Provenance metadata which the VO can't interpret; either the user knows what to do or they can contact the data provider directly for advice. But at the least, the VO will have put the user and the data provider in touch.

As a result of discussion with Mireille Louys and Francois Bonnarel, the model is now explictly recursive to handle observations which are themselves combinations of other observations which can also stand alone. For example, multi-colour optical data may be used separately to extract sources for an SED, or stacked for maximum sensitivity. MERLIN and VLA images of the Hubble Deep Field show structure on different scales separarately or combined. In such cases additional or different processing is applied to the combined observations.

Characterisation is one of the most important classes. This is described as a table in Obs V0.2 and I present a revision of this, with comments:

I have added Uncertainty to Location, this is meant to cover absolute or non-statistical accuracy, like how well the position of a phase-reference source is known. I think that Mapping should come before Location, since functionally you need to know what coordinate system you are using before the rest makes sense.

I found some of the row labels ambiguous; although they weren't wrong they may cause confusion so I suggest replacements (in brackets).

I think it needs to be understood that almost all the quantities except location may be given as a range, so have included the maximum structure size detectable by an interferometer (smallest spatial frequency) under Resolution. Bounds may be non-continuous as explained in Obs V0.2 2.4.

Do we need a separate Velocity column? In some cases spectral and velocity Bounds and/or Support will be different e.g. observation of OH lines at 1665.402 and 1667.359 MHz shifted to a Vlsr of -40 km/s in two 0.5 MHz bands would have Bounds 1665 - 1667 MHz, frequency Support of slightly more than 0.5 MHz around each shifted centre frequency, but a single velocity Bound/Support of 80 km/s (after correcting for bandpass function i.e. dead end channels).

Filling needs to be defined as a sub-set of Support, I have removed 'Factor' as it isn't always a fraction e.g. it could be sources per deg^2.

Fitting Errors are for determining the position, size etc. of sources by fitting Gaussian components, in which the accuracy depends mainly on snr and the errors can be much less than the beam size, for Gaussian-like sources; the same goes for line-shapes in the spectral domain I guess - stochiastic errors.

Number of Visibilities and integration time are more relevant to other things than time quantities (image quality, field of view...). I think that in general interferometry data centres rather than VOs should use algorithms like working out the FoV from the channel width and int. time, and supply the VO with the result.

These examples are for a specific set of observations, relevant to radio interferometry. I don't think that the IVOA can produce a single set of wording to fill in the table to cover all situations (entire observatory catalogues, single dish, optical, x-ray...). The same word means different things in different cultures and people will make mistakes. Rather, we need to assess each row-column pair (e.g. Sensitivity.Spectral) for each sort of data currently on offer or desirable. Some entries may be irrelevant, in other cases can we derive a useful number (or expression if VOs can handle expressions)? Thus the jargon for Sensitivity.Spectral can be transmission curve, grating response, bandpass function or anything as long as the data provider gives a meaningful value if relevant. The issue for VOs is how far are we prepared to go to get this. In complex cases e.g. an interferometer field of view, the interferometry data centre should provide the value. For single dish data, could the VO calculate the field of view as lambda/diameter from the FITS header if than contained lambda and diameter?

Class Spatial Temporal Spectral Flux
         
Mapping WCS Time Stamp Spectral WCS Jy/K
(or MappingFrame       Jy/bm
         
Location Field Centre Date Obs Reference Freq  
  Phase Centre   Reference Chan  
      Rest Frequency, Velocity  
         
Location Uncertainty Astrometry Time Reference Err Standard Used Photometry
         
Bounds Outer limits Start Time Extrema of rms Noise
(or SensitivityBounds) of Observation Stop Time Freq. or Vel. in Stokes I,Q,U,V etc.
        min % polarization believable
        Polarization Angle Error
        Ratio Error
        e.g. in Spectral Index
         
Support Polygons etc. Scan length? Start/end of bands  
  Primary FWHM Full track? or bandwidths and reference channels Peak flux density in total intensity image
        Extrema of polarization or absorption images
        (NB in either case the imag may not cover the whole imageable interferometer field of view)
        Total visibility flux on shortest interferometer baseline
         
Filling Image, visibility data - Fractional Coverage (of image or uv planes) Cycle Time Fractional Coverage  
  MeasuredQuantities - compact sources/deg^2; filling factor of extended emission e.g. Halpha, CO Number Visibilities    
         
Sensitivity FoV Function (of int. time, b/w, primary beam) Synthesis time Bandpass Function Dynamic Range
(or SensitivityFunction) Sidelobe Pattern     Polarization Leakage
         
Resolution Restoring Beam Integration Time Channel Width rms Noise
  Fitting Error      
  Max spatial scale PSR Bin    
         
Sample Precision Natural Beam as resoln? Channel Width  
  Fitting Error      
  (Pixel)      

Revised after comments from Alberto Micol -- AnitaRichards - 08 May 2004

-- AnitaRichards - 07 May 2004

VO flux density issues

Most radio images are measured in Jy/beam (or mJy/beam etc.) but none of the VO tools I have used can interpret that. The beam size is roughly determined by the provenance but within a factor of 2 it is a property of variable processing - noted as Restoring Beam under Resolution under Characterisation. The pixel size of an image is arbitrarily chosen (there are sensible limits but it is not a direct function of resolution). If you use e.g. AVO-Aladin or SExtractor to 'measure' the flux at a particular pixel the flux density is actually in Jy/beam, but the tools think that it is in Jy/pixel or in some cases Jy/arcsec^2.

I think that the most important development for VOs to handle radio data properly is to figure out how to extract the beam size from (often obscure) places in FITS headers for radio images and use the information intelligently. This could be by applying a linear multiplication factor to convert to Jy/pixel or Jy/arcsec^2 on the fly when required, e.g. for comparison with non-radio data (it is much easier than converting magnitudes to physical units!). However results from source extraction should ideally be in both units as radio astronomers will expect Jy/beam.

There is a similar issue for converting x-ray counts. This is more complex as it depends on the source spectral index but there are various standard approximations which could be applied.

Polarization

There is almost no existing VO metadata (apart from limited UCDs) to characterise polarization. This should not be difficult to remedy but needs coordination as different jargon is used in optical/NIR (e.g. ordinary v. extraordinary rays) from the radio examples given in the table above.

FITS Headers

Much vital information is found in FITS headers and some archives rely on these for metadata. We need to establish what we can and can't get out of them and the correspondance with classes and variables in our data model. FitsImageSupport describes current AVO practice, for example when using IDHA.

-- AnitaRichards - 08 May 2004

revised after more mailing list comments and much discussion with Francois Bonnarel and Mireille Louys -- AnitaRichards - 10 May 2004

-- AnitaRichards - 22 Jun 2004

See report of IVOA workshop for updates.

Test example of characterisation of radio interferometry

Topic revision: r8 - 2004-06-22 - 16:03:02 - AnitaRichards
 
AstroGrid Service Click here for the
AstroGrid Service Web
This is the AstroGrid
Development Wiki

This site is powered by the TWiki collaboration platformCopyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback