TaskRunner and CEA with FTP

I wondered whether might be possible to invoke a CEA task and, rather than using myspace or vospace, instead provide a ftp location for indirect parameters or results. CEA reads and writes indirect parameters and results using the standard Java URL-handling libraries - which contain basic clients for http and ftp.

Previously it's been shown that a HTTP url can be used for an CEA indirect input parameter - this was used extensively in JES workflows to chain tasks together. However, HTTP doesn't have a well-defined method of writing files - so it's only of use for CEA inputs, not results.

Similarly, there's code in the vfs layer of VoDesktop for working with FTP - so, with the help of Gary and Mark I ran some experiments with the 2007.4.beta1 client and installed CEA servers to see whether FTP is possible.

Now I accept that FTP has hideous security holes, and using it in the VO is a bit daft with single-sign-on and vospace just around the corner. The vfs layer in the client comes with an implementation for SFTP - which would be preferable- however, a corresponding library isn't available on the CEA servers. Ftp is the greatest common denominator - so, limitations of FTP aside, does it work?

A: partly. Input parameters can be read from FTP, but results cannot be written to FTP it seems, at least for DSA.

FileExplorer to Anonymous and Authenticated FTP.

I found that FileExplorer works nicely with FTP. It's addressbar accepts an 'anonymous' ftp url like ftp://server.name/ and or an authenticated ftp url like ftp://user:pass@server.name/path . The contents are displayed, are browsable, it's possible to upload , overwrite, download, delete files & folders. Ftp locations can also be bookmarked so that they're easy to revisit. So far so good...

CEA Result to Anonymous FTP

I selected the SSA application ivo://wfau.roe.ac.uk/ssa-dsa/ceaApplication, and ran a simple cone-search, position m32. Running this with a direct result (i.e. 'results to cache') ran to completion, as expected.

Then I tried re-running and sending the results to FTP. Clicking the 'indirect' button in the task runner brings up the file chooser dialogue - in which I can use the previously bookmarked ftp location to select a new output file on the ftp server). So everything is fine from the UI side.

However, running the application produced an the error message from the CEA server:

 failed to retrieve ftp://.../result.vot - does not exist
Which is odd, as this is an output, and shouldn't need to exist beforehand.

As a work-around, I created the output file in fileexplorer, and tried re-running. This time the application ended with 'ERROR'. No further information given. The logs on the server contain the following exception:

Query parameter error: couldn't process results destination: String
index out of range: -1 executing wfau.roe.ac.uk/ssa-dsa/ceaApplication
org.astrogrid.applications.CeaException: Query parameter error:
couldn't process results destination: String index out of range: -1
        at org.astrogrid.dataservice.service.cea.DatacenterApplication.getResultTargetIdentifier(DatacenterApplication.java:268)
        at org.astrogrid.dataservice.service.cea.DatacenterApplication.run(DatacenterApplication.java:131)
        at EDU.oswego.cs.dl.util.concurrent.PooledExecutor$Worker.run(PooledExecutor.java:725)
        at java.lang.Thread.run(Thread.java:595)

CEA Parameter from Anonymous FTP

I then ran the adql query interface to SSA. I constructed a basic query Select Top 100 a.ssaID From CrossNeighbours2MASSPSC as a and ran it in direct-result mode. As expected, it produced a votable of numbers.

I then altered the TaskRunner form to take the query parameter from a FTP location. Once I'd created the query file*, it was straightforward to select it from the filechooser dialogue. The CEA application ran fine, fetching the query from ftp and producing the expected results.

* Aside:
I actually found it quite hard to create the remote query file. As this query is to be passed straight to the DSA, it needs to be in ADQL/x format - but there's no facility (that I could see) in the TaskRunner / QueryBuilder to save, export, cut-n-paste, or drag-n-drop the adql/x query. I guess eventually DSA will accept ADQL/s queries natively, but for now, I had to save the CEA tool document, then edit it to cut out and unescape the embedded ADQL/x query - much too techy.

Authenticated FTP

Before trying anon FTP, I'd done the same tests with authenticated FTP.

Writing a result to authenticated FTP failed, but I didn't look in the logs for a cause - I suspect it'd be the same cause as for anonymous FTP.

Reading a parameter from authenticated FTP failed, unlike the anonymous example above. However, I think I used an adql/s query by mistake, and wasn't able to look at the logs for a cause.

Further work.

  • The tests should be re-run with a non-DSA cea server - it might be something specific to the DSA implementation which causes the failure when writing a result to ftp.
    • can anyone suggest a simple-to-invoke non-DSA cea app?
  • Re-try tests of authenticated FTP.

