r1 - 12 Feb 2002 - 17:32:19 - BobMannYou are here: TWiki >  VO Web  > RandomSubset

UseCase: RandomSubset

PrimaryActor:

Research astronomer (or software agent running higher-level task).


EndResult:

Astronomer (or software tool) obtains desired attributes for a random sample (of specified size) of the result set of a previous VO query.


OtherActors:


PreConditions:

Astronomer has run query which has returned too large a result set for his/her planned analysis.


FlowOfEvents:

  1. astronomer expresses dismay at number of objects returned in result of query
  2. astronomer decides what total number of objects or what sampling fraction s/he really wants
  3. a random sampling of the objects from the result set of the previous query is performed, to select the desired number or fraction of objects
  4. the desired set of attributes of the random subsample are returned to the user


PostCondition:

The astronomer has a sensible sample size to proceed with his/her analysis.


BasicAssumptions:

Need some sort of random number generator, and a buffer within which to hold the results from the original query, so that they can be randomly sampled.


AlternativeFlows:

If there is likely to be a very large number of objects satisfying a query, then it may be preferable to ask for a random sample of size N from the outset, rather than get the whole let and then ask for a random sampling to be performed. That may be harder to implement, but it is probably worth the effort, or else a lot of unncessarily lengthy queries will be undertaken.


Discussion:

This seems like something that is definitely required in the VO, e.g. for things like feeding result datasets into a visualisation tool (as described in VisualiseMultiDimensionalResults).

This UseCase would be relevant to a ScienceProblem like ClassifyXraySourcePopulation, where one might wish to start with a general question like 'what are optical/near-infared colours of hard X-ray sources?'. Answering might start by searching for all XMM sources harder than a certain threhold that have optical/near-infrared fluxes or upper limits, and that might return millions of sources. If one wants to look for trends in the data by loading the fluxes into some visualisation tool (see VisualiseMultiDimensionalResults) then that might be several orders of magnitude too many objects, so one would want to look at a random sample.


Links to ScienceProblems:

This might be useful in the analysis of results from ClassifyXraySourcePopulation.


KeyReferences:



GoodStyle: Please add comments below. This area should be used for refinement of the above document. If you want to ask questions or start a dialogue with the author, please use (or create) a topic in the Use Cases Forum.
Author: Once the refinements here and comments in the forum die down, perhaps you could rewrite the problem, incorporating the comments and refinements.

-- BobMann - 12 Feb 2002

Edit | Attach | Printable | Raw View | Backlinks: Web, All Webs | History: r1 | More topic actions
 
AstroGrid Service Click here for the
AstroGrid Service Web
This is the AstroGrid
Development Wiki
This site is powered by the TWiki collaboration platformCopyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback