Research astronomer (or software agent running higher-level task).
Astronomer (or software tool) obtains desired attributes for a random sample (of specified size) of the result set of a previous VO query.
Astronomer has run query which has returned too large a result set for his/her planned analysis.
- astronomer expresses dismay at number of objects returned in result of query
- astronomer decides what total number of objects or what sampling fraction s/he really wants
- a random sampling of the objects from the result set of the previous query is performed, to select the desired number or fraction of objects
- the desired set of attributes of the random subsample are returned to the user
The astronomer has a sensible sample size to proceed with his/her analysis.
Need some sort of random number generator, and a buffer within which to hold the results from the original query, so that they can be randomly sampled.
If there is likely to be a very large number of objects satisfying a query, then it may be preferable to ask for a random sample of size N from the outset, rather than get the whole let and then ask for a random sampling to be performed. That may be harder to implement, but it is probably worth the effort, or else a lot of unncessarily lengthy queries will be undertaken.
Discussion:
This seems like something that is definitely required in the VO, e.g. for things like feeding result datasets into a visualisation tool (as described in
VisualiseMultiDimensionalResults).
This
UseCase would be relevant to a
ScienceProblem like
ClassifyXraySourcePopulation, where one might wish to start with a general question like 'what are optical/near-infared colours of hard X-ray sources?'. Answering might start by searching for all XMM sources harder than a certain threhold that have optical/near-infrared fluxes or upper limits, and that might return millions of sources. If one wants to look for trends in the data by loading the fluxes into some visualisation tool (see
VisualiseMultiDimensionalResults) then that might be several orders of magnitude too many objects, so one would want to look at a random sample.
This might be useful in the analysis of results from
ClassifyXraySourcePopulation.
GoodStyle: Please add comments below. This area should be used for refinement of the above document. If you want to ask questions or start a dialogue with the author, please use (or create) a topic in the
Use Cases Forum.
Author: Once the refinements here and comments in the forum die down, perhaps you could rewrite the problem, incorporating the comments and refinements.
--
BobMann - 12 Feb 2002