Scratchpad for thoughts and issues in the QDM project

Workflow

  • Query UKIDSS for objects in lasSource, joined to the lasSourceXDR5PhotoObj table (to get corresponding IDs in the SDSS
    • Select IDs and magnitudes
    • Restrict the search to stars with corresponding objects in SDSS with distance < dmax
  • Query SDSS for star-like objects in the same region (?how?)
    • Select magnitudes and whether the object is a spectroscopic quasar
  • Join the tables using STILTS to match on the SDSS objID
    • Filter for duplicates by selecting the choices with the minimum separation. NB also consider using Perry's colour check
  • Add colour columns, if necessary

Progress

Workflow mostly complete. Using UKIDSS world results in 7845 rows from the WSA and 27856 from the SDSS, the latter constrained by the range of ra and dec in the WSA sample. File sizes can be approximately halved (to a couple of meg a piece by using VOTABLE-BINARY as output). Using UKIDSS DR2 results in an RA range of 0-360 (!) and a prohibitively large subset of the SDSS.

Questions

  1. Better to add the colours in the DB, and just get STILTS to add the colour that spans the databases? Maybe doesn't make much difference. NB UKIDSS colours are already present in the DB.
  2. Can we use the DB to remove the duplicates? YES - works well for removing ukidss duplicates, but runs like a dog when you try to remove the SDSS duplicates.

Snags

Anomaly Detection Algorithm fails

Table format conversion fails due to out of memory. Fix: use -disk flag.

No Region keyword in ADQL/x

Or at least, as currently implemented in the DSA

Row limits

Queries, especially SDSS ones are going to hit row limits. (It's 2000000 for SDSS and 5000000 for ukidss). We're trying to get 7500000 ukidss and 55000000 sdss objects.

Large where clauses

One idea to make the query to the SDSS more precise is to make the where clause more complex. Unfortunately we hit a limit in the size of the adql that the DSA can handle. The query time seems pretty constant, but above about 256 ra-dec squares, the CEA will fall over. The timings for a query against UKIDSS are: {'1': 299.01464319229126, '8': 279.90131711959839, '2': 284.99364686012268, '4': 283.37446904182434, '16': 303.73224401473999} where the first index is the number of divisions RA and DEC are split into (ie 16 = 16X16)

Lack of synchronous CEA calls

Would speed some bits up....

Can't run secure services such as the WSA through the workbench/voexplorer

Temporary workaround: insert the line
<?CEA-strategy-security ivo://ivoa.net/sso/soap-digital-signature?>
into the tool document after the xml declaration.

Access to SDSS DR5

Through the WSA interface it's possible to access the UKIDSS data and SDSS through a single query. In AstroGrid they're exposed as distinct DSAs meaning that Xmatching outside the DB is necessary. See http://surveys.roe.ac.uk/wsa/sqlcookbook.html#Structured%20Query%20Language%20(SQL)

DSA SDSS doesn't have all the views we need.

No access to the Star view or PhotoPrimary view through the DSA. See http://cas.sdss.org/astrodr5/en/help/browser/description.asp?n=PhotoPrimary&t=V Cannot recreate the view on the fly, as it's derived from PhotoObjAll using a mask on status. ADQL doesn't support bitwise operators, so the status can't be checked without some pretty nasty coding. Workaround? query on mode=1 instead, and hope this is the same thing.

Appears to be no way of accessing SDSS functions

So we can't query on lamba and eta.

Old workbench doesn't support ADQL/s

Workaround: use the query builder to manually translate the ADQL/s into ADQL/x. The query builder is a bit flakey, and this is laborious and labour intensive. ADQL/x is horrible to work with and should be taken out and shot through the head at the earliest opportunity.

Problem running CEA apps through new VOExplorer

VOExplorer v 2007.3.alpha4 cannot run CEA apps, either manually through the UI or through the embedded AR.

Limited debugging info available to the user

Writing this workflow would not have been possible if it weren't for the privileged access I have to the system to read server logs. For instance, when invoking STILTS, the CEA reports back that all is well, even when I have a bug in my STILTS command. The only symptom that the user can see is that the resulting file is empty.

Bug invoking STILTS tmatch2

The CEA app description of the tmatch2 task needed adjusting as it didn't allow the "params" parameter to be omitted.

-- JohnTaylor - 04 Jul 2007

Topic revision: r13 - 2007-08-01 - 09:32:07 - JohnTaylor
 
AstroGrid Service Click here for the
AstroGrid Service Web
This is the AstroGrid
Development Wiki

This site is powered by the TWiki collaboration platformCopyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback