r9 - 08 Jul 2003 - 11:08:56 - ElizabethAudenYou are here: TWiki >  Astrogrid Web  >  DocStore > PhaseBStructure > RegistryIt02Home > DataServiceSchema

Tentative ammendments to Keith's RegistrySchema

I envisage this as used for the first light pass through the Registry, to determine whether DataSets may contain data of interest, or whether the query needs data entirely outside their boundaries. This would be followed by a more precise interrogation including UCDs of a small(er) number of DataSets.

Hanisch et al. is the most recent (presently v.6) version of the document also referred to as (not)Bob's.

Mostly this is as per Hanisch et al. but some changes - e.g.
- should use 'angular' not 'spatial for sky coverage/position/resolution
- some things added


Suggested standard units/conventions:

see http://www.iau.org/IAU/Activities/nomenclature/units.html (This is just for the ResourceMetadata; for DataSets generally the wider conventions of CDS can be used).

I have suggested units; in some cases I suggest alternatives where the conversion may be tricky or where being totally consistent might lead to very small/large Nos (e.g. degrees for angular position, but arcsec for error is more usual) - however I would prefer to be consistent, the first unit before ? is preferred. Approximate conversions suffice to answer 'Is this catalogue any use' with 'maybe/no'.

This has implications for the user query; for the very first iterations we may have to force the user to use standard units but very soon we should be able to interconvert Jy/Mag/?x-ray units? and wavelength/freq/eV units etc. For Resource metadata selection this does not have to be precise.

Should the units be added to the schema?

Data types and null values

Is it simplest if every element should occur at least once, and we use null values as suggested in the [[http://cdsweb.u-strasbg.fr/doc/VOTable/votable-1-0.htx][VOTable documentation]]? e.g. use NULL for strings with no value and NaN or +INF/-INF for decimals - this allows us to sort and (de)prioritise DataSets lacking the relevant ResourceMetadata entry.


CONTENT

"subjectkeywords" (new element)

One or more keywords taken from the dataset header, e.g. a subset of the third column on the Vizier catalogue selection page. See http://adc.gsfc.nasa.gov/adc/adc_keyword_index.html and http://vizier.u-strasbg.fr/doc/ADCkwds.htx We should add from the ADC list or the Vizier simplification as required, sparingly.

Note planetary nebulae are Nebulae not planets
Galaxies means external galaxies, not the Milky Way

Is there anything equivalent for Solar/STP?

In Hanisch et al. 'subject' is included in curation metadata, but I feel it fits better in content. However I don't really mind; this and some other things listed under CONTENT below should maybe be in CURATION?

"type" means FITS, ASCII etc?

I am using 'table' to mean data which could be searched directly in a database or be converted to VOTable, e.g. a list of sources and properties. Other 'nDim' data which requires special viewers/extraction software, e.g. FITS, will always? have an associated table describing it, e.g. a list of pointings and other observationsal details.

I think that we can cover whether nDim data are images, spectra etc. by whether elements like "decmin" or "spectralresolution" have meaningful values, or the null value.

COVERAGE

wavelength coverage: Hanisch et al. has seven contiguous divisions to cover the electromagnetic spectrum, based on the Vizier categories. These are, in meters (approx No. catalogues in Vizier):
Radio > 100 e-6 (450)
IR 1e-6 - 100e-6 (550)
Optical 0.3e-6 - 1e-6 (2550)
UV 100 - 300 e-9 (130)
EUV 10e-9 - 100e-9 (1)
X-ray 0.01e-9 - 10e-9 (170)
Gamma-ray < 100e-12 (25)
This is just for convenience; the exact boundaries do not matter and if overlapping or close to an edge a dataset should be classified in both adjacent wavebands. However for ease of seaching we might want to get similar Nos of catalogues per waveband, which suggests amalgamating the UV and EUV and subdividing optical. We might also want to subdivide radio into
Radio > 10e-3
mm-wave 0.1e-3 - 10e-3
I get the impression that in the existing science cases where these data might be relevant, the user either wants just radio or just the new mm-wave waveband, or wants both but at differnet stages of the query. mm-wave covers a lot of molecular lines (single-dish and inerferometry), SCUBA and eventually ALMA. But this should be seen as a pragmatic question, not a point of principle.

"angularfraction" (a fraction) is for datasets containing images or imageable data; "sourcedensity" (sources/deg^2) is for datasets containing lists of sources with positions. Note that the total fractional coverage is different from the resolution.

Decimal JD would be simplest for date/time coverage, in which case a separate unit e.g. sec would have to be used for time resolution.

In future iterations AstroGrid may want to go for indexing/matrix representation rather than the shapes suggested by Hanisch et al. - see Indexing the Sky. Very simple versions could be used for the ResourceMetadata - e.g.
Angular: RA/Dec in bins of 1 degree in Dec, and 1, 1.2, 1.5, 2, 4, 10, 360 degrees of Ra as you approach the poles;
Electro-magnetic spectrum: log(metres) in increments of e.g. 0.1
Time - days

RESOLUTION

Note that it is easier to express spectral resolution as (finest channel width)/central value), e.g. delta-lambda/lambda, as this avoids unit problems, but this cannot be done in a universal way for other sorts of resolution.

We should use the best value in the data for now, and later include algorithms to allow for e.g. angular reaolution as a funtion of frequency for multi-frequency data.

DATA QUALITY

Things like angularresolution and sensitivity will initially probably be given as the best value of all errors (systematic and random) correctly combined. However in some data sets these may cover a wide range. E.g. astrometry error can depend on sensitivity and resolution; in observing logs resolution may be frequency-dependent. Ultimately we should be able to express these things as functions which are evaluated depending on other bits of the data set or even the query. E.g. the MERLIN archive covers frequencies from 0.408 to 22 GHz has a best resolution of 0.''008 but this is at 22 GHz; the resolution at 5 GHz is 0."050 and if you want higher resolution you have to go to e.g. the EVN archive.

UCDs

Should there also be an element to link to the UCDs for the dataset?

Many of my changes are probably incorrect xml, sorry, but I hope intention is clear.


dataService


Schema: http://www.w3.org/2001/XMLSchema

include: schemaLocation="serviceLocation.xsd"

elements:

CONTENT
"content"               string          (see elements following)
"facility"              string
"instrument"            string
"format"                string          (VOTable, ascii, FITS etc.)
"briefsummary"          string
"tablenrows"            integer         (Number of rows in table)
"tablencols"            integer         (Number of columns in table)
"tablesize"             decimal         (bytes - size of table excl. linked nDim data)
"ndimdatasetsizemin"    decimal         (pixels x pixels)
"ndimdatasetsizemax"    decimal         (pixels x pixels)
"nndimdatasets"         integer         (number of nDim data sets)    
"type"                  string          (archive, survey, catalogue, bibliography, 
                                         journal, library, outreach, education, 
                                         eporesource, integrated, nameresolver)
"subjectkeyword"        string          (Galaxies, Milky Way, Nebulae, Planets, 
                                         Solar system, Stars)
---

COVERAGE
"coverage"              string          (see elements following)
"wavelengthrange"       string          (gammaray, xray, xuv, uv, optical, ir, 
                                         mmwave, radio)
"wavelengthshort"       decimal         (metres)
"wavelengthlong"        decimal         (metres)
"ramin"                 decimal         (degrees)
"ramax"                 decimal         (degrees)
"decmin"                decimal         (degrees)
"decmax"                decimal         (degrees)
"sensitvity"            decimal         (Jansky? also allow Magnitude? eV?)
"startdate"             decimal         (JD.xxx) 
"enddate"               decimal         (JD.xxx) 
"angularfraction"       decimal         (dimensionless fraction)
"spectralfraction"      decimal         (dimensionless fraction)
"temporalfraction"      decimal         (dimensionless fraction)
"sourcedensity"         decimal         (counts per square degree)
---

RESOLUTION
"resolution"            string          (see elements following)
"angularresolution"     decimal         (degrees? arcsec?)
"spectralresolution"    decimal         (dimensionless fraction)
"temporalresolution"    decimal         (sec)
---

DATAQUALITY
"dataquality"           string          (see elements following)
"astrometryerror"       decimal         (degrees? arcsec?)
"photometryerror"       decimal         (Jy? Magnitudes? eV? dimensionless fraction?)
"timingerror"           decimal         (sec)

See examples of usage drawn from science cases in the BrownDwarfMetadataList in BrownDwarfRegistryRequirements and the metadata list in DeepFieldSurveysRegistryRequirements

Examples of ResourceMetadata for the 2MASS and MERLIN+VLA HDF(N) datasets.

-- AnitaRichards - 25 Apr 2003

Edit | Attach | Printable | Raw View | Backlinks: Web, All Webs | History: r9 < r8 < r7 < r6 < r5 | More topic actions
 
AstroGrid Service Click here for the
AstroGrid Service Web
This is the AstroGrid
Development Wiki
This site is powered by the TWiki collaboration platformCopyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback