r12 - 19 Dec 2002 - 17:16:59 - AndyLawrenceYou are here: TWiki >  Astrogrid Web  >  DocStore > PhaseBDocs > RbProtoVOSurveyReport

PhaseAReport

(4) Virtual Observatory Prototypes

(4.0) Introduction

The idea of the Virtual Observatory arose gradually over a period of time in which astronomical data archive sites continually improved their facilities and user-interfaces, but in a piece-meal and uncoordinated way. Many of us realised that a coherent and planned approach, with standardised interfaces, would permit a radical advance. As part of AstroGrid's initial programme, therefore, we carried out a short survey of the most advanced existing web sites and software packages to determine the facilities they provided and how they worked. Nearly all of these rely on technical solutions which pre-date Data Grids and the Web Services paradigm, so we did not expect there to be much scope for direct technology transfer, but the facilities reflect the perceived needs of the astronomical community, and we hoped to learn a lot from them about what AstroGrid needs to provide, and note the strengths and weaknesses of the current solutions.

These investigations were carried out as a joint exercise between our grid technology and database technology teams. It should be noted that they represent a snapshot of practice in the early part of 2002, and that many facilities have changed since then.

(4.1) Current Services, Sites, and Software

4.1.1 Astrobrowse

Astrobrowse is one of the web interfaces provided by the High Energy Science Archive Research Center (HEASARC) at NASA Goddard Space Flight Center (GSFC). Its most advanced facility is a distributed cone search: users can enter the coordinates of an object and a search radius into a single form and then search a large number of different on-line astronomical catalogs from around the world. Alternatively one can specify the name of an object, and NED or Simbad will be used to determine its coordinates.

There is a choice of a quick or full search, the latter form giving many more options and allowing a more selective search of archive sites. Bandpass, data type, and other keywords can be specified to refine the selection. One can also select which types of service to interrogate (e.g. optical or radio), or which individual servers, with considerable flexibility.

Astrobrowse uses AstroGLU software, a development of the GLU system written at CDS (Strasbourg). This contains a list of URLs of each data archive service and the parameters each requires, which it uses to generate a customised CGI query for each of them in turn. Only one or two sites have adopted the conventions proposed by CDS, so the GLU database is essentially a loving compilation of the idiosyncrasies of each site.

Astrobrowse then waits for the various replies to come back. This may be slow if one selects a large number of servers around the world: the results page has a side-bar reporting the status of each service, with an option to refresh this at intervals. In practice it takes only a few seconds to get results from the most servers, but it may be necessary to wait for a few minutes for all of them to respond (and at times some responses never appear). Astrobrowse has not attempted to solve the problem of integrating the results, which appear in a wide variety of formats, requiring some expertise to understand. This is obviously a difficult problem, but one which the VO needs eventually to solve (e.g. with the aid of VOTable and UCDs).

The software is available for download, and a version is also in use at Harvard/Smithsonian Astrophysical Observatory. Our tests were carried out on version 1.7.

4.1.2 Browse/W3Browse

Browse also comes from HEASARC at GSFC: it is essentially a search engine for tabular data, originally designed for data from high-energy observatories, but now broader in scope, including many radio and optical catalogues, as well as links to Vizier. The underlying DBMS is currently Sybase, but some attempt has been made to keep the software DBMS-independent. Version 6.3 was current when these comments were made. As with many astronomical archives, the primary search is by position or object name, resolved using Simbad or NED. Results can be produced in four formats: plain text, HTML tables, FITS tables, or Astrores XML.

Although the basic facilities are similar to many other data archives, Browse has two particular strengths:

  • It can cross-correlate (join) two or more tables on celestial position (or date/time of observation). The results of the join can be sorted, or plotted if the browser is Java-enabled (but this often turns out to be slow).
  • Browse also has links to original datasets so that, having found the required observation, users can download data products from observatory missions. These two features: joining tables, and downloading data products, are very valuable, and must form a feature of our VO design.

HEASARC's web-based browse service was for a time known as W3Browse to distinguish it from the original BROWSE service accessed by telnet. The software was originally written in ESOC around 1980 as the interface to the EXOSAT Observatory's data archive. The service later moved to ESTEC, then to HEASARC, and versions were subsequently installed at other sites including LEDAS (Leicester) and MPE (Garching). The original telnet service is still available at LEDAS, although usage is dwindling. A few of the useful features of the original Browse have been lost in the web version, e.g. the ability to save the results of a filtering (select) operation and then make further selections on that. These features may be beyond the scope of a simple web service, but certainly ought to be provided by a data mining service. This is something that our MySpace concept should be able to address.

4.1.3 CURSA

CURSA is a Starlink package for manipulating astronomical catalogues and tables. It is mostly concerned with accessing catalogues held as local files, but also provides some facilities for searching remote catalogues. Remote catalogue searches are available within the GUI-based catalogue browser xcatview and from the Unix command-line by using the application catremote. The only type of remote search supported by either application is the `cone search' to find objects within a specified angular separation of a specified central celestial coordinate. Optionally, the name of an astronomical object may be given instead of a central coordinate and the SIMBAD or NED name-resolver is used to replace the object name with the corresponding coordinates. For some catalogues catremote also allows additional selections on pre-defined columns (for example, limiting the selected objects in the specified region of the sky to also lie in a given magnitude range).

CURSA uses exactly the same mechanisms and formats as SkyCat and GAIA for submitting queries to a remote catalogue and returning the table of results, and has the same advantages and limitations. The most notable limitation is that it is only possible to search one remote catalogue at a time. xcatview's GUI for searching remote catalogues has a rather different layout to SkyCat's. catremote is suitable for embedding in scripts as well as for interactive use from the command-line. Searching elements of the VO from within scripts which perform specialised, bespoke tasks seems a likely requirement for the VO.

It is also worth noting that CURSA is based on the FITSIO library written at GSFC, which has FTP and HTTP protocols for data access, so that all the CURSA tasks can access remote tables if they are present on FTP or HTTP repositories in FITS table format. They do this, however, by copying the entire file to local memory or scratch disc. The same facility is built in to the FTOOLS utilities for handling FITS files provided by GSFC. For small tables this is fine, but for large tables bandwidth can be a problem.

The tests were done on CURSA version 6.4. Documentation is provided in two Starlink documents, SUN/190 and SSN/76.

4.1.4 ISAIA

The ISAIA project, led by GSFC, was intended to develop a number of virtual observatory concepts, such as the integration of results from queries such as those sent out by systems like Astrobrowse. Although the project is no longer active, and those involved are now part of the NVO team. The website contains several useful documents, but these are now being overtaken by more recent developments.

4.1.5 MAST

MAST (Multi-mission Archive for Space Telescope) comes from the Space Telescope Science Institute (STScI) and is the optical/UV/near-IR component of NASA's distributed Space Science Data Services and aims to provide integrated access to data from a range of missions/projects, namely: Hubble Space Telescope (HST), Far Ultraviolet Spectroscopic Explorer (FUSE), International Ultraviolet Explorer (IUE), Extreme Ultraviolet Explorer (EUVE), Hopkins Ultraviolet Telescope (HUT), Ultraviolet Imaging Telescope (UIT), Wisconsin Ultraviolet Photo Polarimetry Experiment (WUPPE), Copernicus (OAO-3), Orbiting and Retrieval Far and Extreme Ultraviolet Spectrograph (ORFEUS), Berkeley Extreme and Far-UV Spectrometer (BEFS), Interstellar Medium Absorption Profile Spectrograph(IMAPS) (first flight), Tübingen Echelle Spectrograph (TUES), Digitized Sky Survey (DSS), Guide Star Catalog II (GSCII), Sloan Digital Sky Survey (SDSS), FIRST (VLA radio data), Roentgen Satellite (ROSAT).

MAST offers a series of cross-mission search tools, which vary from tools directed to specific science cases (e.g. find all data in MAST archives close to Abell clusters) to a Single Target quick search interface, where one can enter coordinates or source name (to be resolved by NED or SIMBAD), and get a list of data available within MAST: e.g. entering M31 yields the predictable long list of data - one nice feature is that preview versions of images (in GIF format) are provided. This interface also enables searches by data type - e.g. checking the "X-ray spectra" box returns a link to the top page of MAST's ROSAT site, as well as a helpful note that HEASARC provide access to a wider range of high-energy data. There is also the MAST Scrapbook, which offers users "representative images or spectra of an astronomical object" (specified by coordinates or resolvable name, as before) - basically another way of asking for preview data.

MAST also hosts a series of Prepared Science Products. Examples of these are the Hubble Deep Field (North and South) datasets, various UV spectral atlases and the SDSS Quasar Catalog, derived from the Early Data Release of the Sloan Digital Sky Survey. A page on Data Analysis Software lists "some of the data analysis software packages used for the MAST archived data", mostly comprising IRAF packages written for specific instruments. Data transfer varies between the different archives in the MAST system, but is usually either via anonymous ftp or direct downloading from a WWW browser.

Overall, MAST is a good example of a current-generation archive site: it provides access to data from a number of sources in a reasonably coherent and user-friendly manner, and with a reasonable amount of documentation (the documentation for the SDSS EDR site has improved markedly in recent months) and a Helpdesk which is staffed Mon-Fri 09.00-17:00 (EST), but it seems very interactive - there's no obvious batch mode facility, or any such means of submitting large numbers of queries in an automated fashion.

4.1.6 NED

NASA's Infra-red Processing and Analysis Centre (IPAC) at CalTech provides the National Extragalactice Database (NED). It is built around a master list of extra-galactic objects for which cross-identifications of names have been established, accurate positions and redshifts entered to the extent possible, and some basic data collected. Bibliographic references relevant to individual objects have been compiled, and abstracts of extra-galactic interest are kept on line. Detailed and referenced photometry, position, and redshift data, have been taken from large compilations and from the literature. NED also includes images for over 700,000 extra-galactic objects from 2MASS, from the literature, and from the Digitized Sky Survey. NED's data and references are being continually updated, with revised versions being put on-line every 2-3 months. In essence, therefore, NED provides facilities somewhat similar to those of CDS but specially only for identified extragalactic objects, and specially adapted to the needs of those studying them.

4.1.7 Querator

Querator is a tool for extracting images of a given region of sky from image surveys. It was developed by Francesco Pierfederici at the European Southern Observatory (ESO). In a single query it can extract images of the same region of sky from several different surveys, thus allowing a stack of images, typically in different colours or wavelength ranges, to be returned. Currently images from the HST and various ESO telescopes are available. The region of sky to be extracted can be specified in a number of ways:

  • object name,
  • sky box,
  • external server search,
  • user file upload.
The object name, sky box and user file upload options are all as would be expected. In the first two cases the user gives, respectively, an object name or the meridians of Right Ascension and parallels of Declination defining a region of sky. Additional constraints (exposure time, observation date, wavelength range, instrument etc.) can be specified to refine the search. In the user file upload option the user gives the name of a prepared file containing a list of object names or coordinates. This useful option allows images to be retrieved for a number of regions in a single operation.

The "external server search" option is more interesting and innovative. Here the user submits a query to a remote catalogue archive (such as LEDA or the NASA ADC catalogue collection), which is quite separate from the data centre holding the image surveys, in order to search one or more catalogues according to an arbitrary criterion the user has supplied. The remote catalogue archive returns a list of objects which satisfy the query. Querator takes this list and retrieves images for all the objects listed.

Access to Querator is solely through a Web interface, which is generally easy to use. However, constructing the query for the remote archive in the "external server search" option is complicated. The query is constructed using the native syntax of the remote service and thus varies between different services. Querator seems to still be under active development. The query pages crash netscape running on a Compaq/Alpha but are ok on Sun/Solaris (though this could be a bug in netscape for all that I know). The surveys currently accessible mostly seem to consist of pointed observations. However, presumably, there is no reason why Querator could not access contiguous surveys, such as the DSS.

Querator has a number of features which seem likely to be required in the VO, including the ability to retrieve a stack of images from several surveys in one query and something analogous to the "external server search" option. However, to make the latter easy to use a unified (and simple) query syntax to specify queries on all the remote catalogues is required.

No version number was given. The tests were conducted on 13 February 2002.

4.1.8 Simbad, Vizier, and Aladin

These interwoven services are provided by the Centre de Doneés de Strasbourg (CDS), the oldest and probably largest collection of astronomical data resources in the world. There are many interconnections between the separate services which make the system easier to use, but make it harder to see which part does what. Essentially:

  • Simbad is principally a bibliographic archive, which includes information about all papers in the primary astronomical literature about objects beyond our solar system, including the properties of the celestial objects listed therein. This means that Simbad holds many of the small surveys in the literature (e.g. fields of the 5th Cambridge catalogues of radio sources) but not the massive data-collections like SDSS. Naturally, the type of measurements are very varied. Simbad's web interface only allows queries on a few basic parameters, most of which are biased towards stellar astronomy. We note that Simbad's object classification scheme needs to be considered in our ontology efforts
  • Vizier is billed as a "catalogue of catalogues" which underplays what it can be used for. There are two main uses: selecting catalogues (by criteria such as "contains QSOs") and listing the descriptions of those catalogues; selecting objects from the union of all the catalogues by various criteria. That is, Vizier allows both metadata and data searches. The fact that these two modes are driven from the same interface-page makes Vizier harder to use than it need be. The "union of all catalogues" seems to mean the catalogues absorbed into Simbad plus major external data-collections such as 2MASS.
  • Aladin is an image display with advanced overlay-features. The Vizier and Simbad operations can display results by returning a web page in which Aladin runs as an applet with the data preloaded. Alternatively, Aladin can run as an application and can send queries to Simbad and Vizier in response to user actions. Aladin allows overlay plots from many catalogues to be stacked up, and provides good controls for manipulating the stack (e.g. controlling visibility of particular planes).
  • The bibliographic database lists the papers from which data were taken for Simbad. This makes it excellent for use with Simbad and dangerous to use for any other purpose due to the specialized pre-selection. The service also holds on-line abstracts of recent papers in selected journals.
The popularity of these services is shown by the existence of several mirrors: Simbad has a mirror in the USA, while there are already half dozen mirrors of Vizier, including one in the UK (Cambridge). Simbad is also the principal name resolver (translator from celestial object name to coordinates) used by many other sites around the world: it is an obvious candidate for conversion to a Web Service using SOAP/WSDL.

4.1.9 Skyview and Skymorph

Skyview is another service of HEASARC at GSFC. It describes itself in these terms: SkyView is a Virtual Observatory on the Net generating images of any part of the sky at wavelengths in all regimes from Radio to Gamma-Ray.

The SkyView server contains copies of images of the sky taken in a wide range of wavebands from radio to gamma ray, mostly (perhaps all) stored as FITS files. The SkyView software, written at GSFC, selects and overlays these images, giving results in one's chosen resolution, and it automatically handles rotation, precession, coordinate transformations, and pixel re-sampling. The results can be seen on the screen, or a FITS image can be downloaded from an FTP area in a number of formats including FITS, TIFF, GIF, and PostScript?. There are actually five different interfaces: for the non-astronomer, basic, advanced, Java, and X-windows. The latter is regarded as obsolescent, now that Java can provide the required controls. More advanced image options, such as changes to color tables, overlays on extent images, image rescaling, zooming, etc. require a Java-enabled browser.

Advanced options include the ability to overlay data from two or three different data sources, perhaps mapping each to a different primary colour, producing a pseudo-colour result. This is even possible for those using 8-bit displays. SkyView? can also implement boxcar averaging of an image, to obtain a smoothed result. There is also a batch option, with Perl scripts which can be downloaded and run on a Unix/Linux system. The software is freely available for download.

The software, written by Tom McGlynn and his team, is all available for downloading and external use. The comments here apply to Web Version 4.1, with version 3.2 of their Geometry Engine. A new interface is currently (2002-02-01) on beta-test and allows the interface to be customized; a few functions did not seem to be working correctly when tested.

Overall the SkyView facilities for image selection and display are so comprehensive, and so well covered by the documentation, that it is hard to think of features still lacking. But it must be noted that facility has been provided entirely using local storage, by taking copies of datasets produced elsewhere and, where necessary, reformatting them to suit SkyView. It was, apparently, decided when Skyview was designed that the Internet did not have enough bandwidth to allow the retrieval of images from remote sites.

SkyMorph specialises in searches for variable, moving or transient objects. It provides convenient access to optical images and catalogs generated by the Near Earth Asteroid Tracking (NEAT) program. These include more than 67,000 CCD images covering a large fraction of the sky. The same region is typically observed several times each night, and is revisited on monthly and yearly timescales. SkyMorph appars to be based on Skyview, and seems to have few unique features of VO importance, but it is one of the few services which supports the time dimension, which AstroGrid must not neglect.

4.1.10 Skycat, GAIA, and JSkyCat

SkyCat is an image display tool developed by Allan Brighton and colleagues as part of the ESO VLT project. GAIA is an enhancement of SkyCat by Starlink, which has added numerous astronomical analysis facilities, including: astrometric calibration, automatic object detection and aperture, optimal and surface photometry. Both SkyCat and GAIA are mostly concerned with accessing local files. However, they both contain some limited facilities for accessing remote catalogues and image surveys. GAIA's facilities in this area are identical to SkyCat's and the following notes apply to both applications.

SkyCat and GAIA can access a reasonably extensive remote collection of standard astronomical catalogues and a few image surveys, principally the HST Digitised Sky Survey (DSS). The principal purpose of remote catalogue searches in SkyCat and GAIA is to find objects which overlay an image that has already been displayed by the application (though searches can be made which are not connected with any image). Consequently, the only type of remote search supported is the "cone search" to find objects within a specified angular separation (or `radius') of a specified central celestial coordinate. Optionally, the name of an astronomical object may be given instead of a central coordinate and the SIMBAD or NED name-resolver is used to replace the object name with the corresponding coordinates. For some catalogues additional selections are also supported on pre-defined columns. For example, it may be possible to select objects which lie in the specified region of sky and which also lie within a given magnitude range.

Regions of sky can be extracted from image surveys by specifying the central coordinates and size of the field required. Again, optionally, an object name can be substituted for the central coordinates.

Skycat and GAIA have a convenient user-interface which is well-integrated with the rest of the display functions. Retrieved objects are automatically plotted on top of a displayed image if they overlay it. It is easy to highlight a given object in both a table of the selected objects and in an image overlay plot.

The list of remote catalogues and image surveys available to Skycat and GAIA is held as a text file. This arrangement is good in that the list is not hard-wired into the code and can be customised, but is bad in that the file has to be edited manually, rather than maintained automatically by a `resource register' of the sort that we have been discussing.

Queries are submitted and results returned using HTTP protocols. The query format is somewhat restricted (and is similar to, but not identical with, the ASU query standard). Tables are returned in the Tab-Separated Table (TST) format, which is somewhat deficient in catalogue metadata, though it does contain enough information to define how objects are to be plotted on overlays (ellipses etc). Images are returned as FITS files.

A VO client or portal would need to provide at least all the remote access facilities of SkyCat and GAIA. Their principal disadvantage is that they can only search one catalogue at a time.

The version tested was GAIA version 2.6, derived from SkyCat version 2.4. On-line documentation for SkyCat is available from its home page at ESO.

JSkyCat is a re-implementation of Skycat (above) in Java. It was also developed by ESO. It has similar functionality to the original Skycat, but has fewer features because it is still under development. JSkyCat is written using the JSky Java class library, elements of which are also used in the Gemini Observation Planning Tool.

The remote catalogue and image survey access facilities in JSkyCat are essentially identical to those in SkyCat: it provides the same functionality, uses the same mechanisms and formats for submitting queries and returning tables of results and has the same advantages and disadvantages.

The version tested was JSkyCat 1.2; on-line documentation is available from the ESO web pages.

4.1.11 Starcast

Starcast, also from STScI, is MAST's prototype implementation of Astrobrowse, described above. The Starcast implementation currently uses a Perl interface to the profile database, not the CDS GLU system as used in the original Astrobrowse prototype at HEASARC, but is intended to migrate to using GLU at some point: this will mean that the Starcast administrator will not have to input the profiles manually, as is currently the case.

The Starcast query form allows the user to search for data around a sky position or an object with a name that can be resolved into a sky position by NED or SIMBAD. The user then specifies the Bandpass (running from radio to gamma ray) , Data Source (with choices Any, Derived, Observations, Pointed, Proposal, Survey, Survey Data), the Data Type (Any, Catalog, Image, Images, Other, Spectra, Spectrum, Time-series), and Location (Any, or a selection from a list of about 30 international data centres) and sets the query running. The browser moves to a new page, with two frames, one of which lists the specification of the query, and the second gives links to the services which might have data satisfying the query: next to each of these links is an icon showing whether the search on that resource is running, has completed successfully or has crashed, and these may be updated by pressing a Check status button at the top of the frame.

The implementation of this service seems incomplete, in some sense. For example, a test query asking for EUV data from Any Source of Any Type and at Any Location within 10 arcmin of 10 00 00 -10 00 00 returned a number of links, one of which was to the IMPReSS interface at the NASA ADC at Goddard. Clicking on that link took me to a WWW page generated within the IMPReSS system, which listed the sky position for my search and presented me with a list of archives (not just EUV, but also X-ray and optical) with data around that position, asking me which I wanted to choose. Clearly, since I'd already specified my query on the Starcast WWW form, I should have been taken one stage further within IMPReSS. This is something of minor quibble, for what is, after all, just a work-in-progress prototype implementation, but it does highlight the difficulty of fitting a top-level query interface on top of existing data centres, each of which provide access to their archives in different ways.

4.1.12 Starview

StarView comes from the Space Telescope Science Institute (STScI) and its blurb says that "StarView is an astronomical database browser and research analysis tool. Developed in Java, StarView provides an easy to use, highly capable user interface that runs on any Java enabled

platform as a standalone application." Download and installation (under Windows NT) was remarkably simple, and the Java GUI is very nice. Starview can be used to search for data in MAST archives, examining the calibrations used for a particular dataset, and look at proposal information relating to past HST projects.

Downloading of data through Starview requires registration with the STScI? archive, and is performed either by leaving the results file in an anonymous ftp site, or by ftping them to the user's machine: the latter requires supplying the user's password, and this is very unpopular with some system managers. One nice feature is that you can track the progress with your query on a WWW site.

Queries are defined by starting with a form (a number are provided, as templates for searching particular archives or making particular kinds of query) and then the user adds qualifications to narrow the search. Queries can be written out as SQL, which is nice, and there is also a Cross-Qualifier feature, which allows the results of one query to be used as input to constructing a second: this seems a very useful feature, but the instructions are not clear enough to enable a user to use this option at a cursory reading.

The results of a query are listed in another GUI, and datasets can be selected from that window for futher operations - e.g. previewing images or spectra, looking in ADS for references known to have resulted from that HST proposal, overlaying the instrument footprint on a DSS image - while the list itself can be exported as an ASCII file. One very nice feature is that the list of returned datasets can include proprietary ones, for which the date of public access is listed: slightly annoyingly, one has to remove those datasets from the list manually before asking to retrieve the data...surely a better default is not to include them, and then only the PI (who can access them) would have to do anything. A variety of data types can be retrieved - it is interesting that one can retrieve data quality information and/or observing logs, in addition to the data themselves.

All in all, this is a very interesting tool, displaying much of the functionality that one would require for the VO. As with the MAST WWW interface, this is still very interactive, but it does have the advantage that one can store and reload queries one has formulated interactively using the GUI. Again as for the MAST WWW interface, there is no description of the technology used, beyond Java.

4.2 Testing CDS and NED with use cases

In order to find out what these services can do, we tried to use them to do the Astrogrid use cases, but found that only a very limited subset of what we wanted to do was currently possible. The section names in the following are the Wiki-names of the use cases in the VO Wiki-web.

4.2.1 FindQSOsByPosition

Vizier and Simbad can do the main flow of this use case, using Aladin as the display tool. Ironically, Aladin itself can't make the necessary selection. None of these tools can merge the tables of results.

NED can do some of this work. It can't select on radius from the search centre, but it can select on ranges of RA and dec., which is almost as good. It understands "QSOs" and "QSO clusters" as selection criteria. The plotter ("skyplot") from NED is poor (line graphics only) and is non-interactive: there is no way to select objects on the display and get to their details from the catalogue.

4.2.2 GetLiteratureReference

This feature is available in Simbad. Results of searches carry hyperlinks to entries in CDS' bibliographic service. However, only selected references are shown (to explain where the Simbad data came from, not to refer to the science). It is possible to query the bibliographic service directly, but the number of references returned is surprisingly small (e.g. 7 for a search on NGC1068).

NED allows one to query the database of abstracts directly by object name. This finds many (all?) references (e.g. 1249 for NGC1068).

4.2.3 GetReducedSpectra

There is no obvious a way to get any actual spectral pixel-data from any of these systems except one small part of IPAC. The SWAS mission, available through IPAC, serves spectra as either web graphics or in FITS files.

4.2.4 InstrumentFootprint

None of the systems seem to allow an instrument footprint as a search area in a query.

Aladin allows a user to draw one of a set of limited footprints as an overlay on an existing plot. If one then measures the footprint in Aladin one can get a search radius that encloses the footprint, and can search on that radius in Vizier. This allows the work to be done manually.

4.2.5 ObservingProposalCheckForData

None of the systems do this use case. There are no links to observation-proposal systems.

4.2.6 PhotometrySearch

None of the systems appear to cover this case, and there are no references to software for interconverting magnitudes and fluxes except in NED, which is trying to go in the opposite direction, from photometry to coarse spectra.

4.2.7 PosteRestante

None of the systems even attempt this except for 2MASS (accessed through IPAC) which has a batch system for producing image extracts.

4.2.8 SelectAstrometricStandards, SetImageWCS

None of the systems can do these cases as written. There is no support for actually doing the astrometric fit, nor for plotting the residuals on the fit.

Vizier and Simbad can do most of SelectAstrometricStandards, but they cannot select the "best" catalogue out of the many available. Aladin doesn't help with this case, since the idea is to automate the process, not to do it interactively.

NED is not very helpful, since stars are needed, not extra-galactic objects (but QSOs may be valuable in future).

4.2.9 SyntheticSpectra

NED can do this very nicely, but only for one object at a time. The initial selection of data is not quite as general as in the use case. The plot is done as a web graphic displayed in a web browser.

Simbad, Vizier and Aladin can't do this work. It isn't even straight-forward to extract the photometry so that one can do it manually.

4.2.10 SelectionOfTrustedCatalogues

None of the systems allow this work to be done as stated.

NED allows references to be looked up, but not using bibcodes.

Vizier, Simbad and Aladin do not support bibcodes as a search term, but they do return bibcodes in the results of their results. The CDS bibliographic service does good searches by bibcodes.

None of the software helps with handling the list of data and bibcodes as suggested in the use case. Aladin could be used to display the objects in the user's catalogue and the user could then cross them off as the bibcodes were checked by drawing into the graphics overlay.

4.2.11 Use cases involving authorization and authentication

None of the systems inspected here deal with these issues.

4.3 Recent VO Prototypes

The facilities examined in section (4.1) were those which existed before the AstroGrid project started and our survey of them was completed in early 2002. Here we present information on newer VO-related projects.

4.3.1 Sky Server

The site skyserver.sdss.org provides public access to the data products from the Sloan Digital Sky Survey (SDSS). About 80 GB of data (14 million objects) from the first year's scans are currently available. The original plan by Johns Hopkins University was to use an object-oriented DBMS (Objectivity/DB) but various problems with performance and software support led them to switch to a purely relational solution: Microsoft SQL Server. This transition seems to have been remarkably smooth, but the assistance of Jim Gray, a Microsoft "Distinguished Engineer" and manager of Microsoft's Bay Area Research Center (BARC) surely had a lot to do with this.

The skyserver database is hosted in Fermilab, but it is managed jointly from BARC and JHU. The structure (schema) of the set of relational tables was designed after a set of 20 typical queries was defined by Alex Szalay and his colleagues at JHU: this represents, then, another design which has been use-case driven (but here without the aid of UML). The web server is based on Microsoft's Terraserver, and uses many other Microsoft products and standards such as IIS and Active Server Pages, but considerable efforts have been made to make the resulting web-site accessible from browsers of all kinds. In general the clients only need Javascript, but there is one applet, SkyServerQA?, which can be downloaded. This makes it very portable, but some of the Javascript appears to put a heavy load on a PC, making other screen updates noticeably slow when the Skyservery screen is visible.

The SDSS scientists are especially interested in galactic clustering and large-scale structure of the universe. To make spatial queries run quickly they created an index based on the HTM (Hierarchical Triangular Mesh). Their SkyQuery service is designed to support spatial joins with two other large catalogues: 2MASS (at Cal Tech) and the VLA FIRST survey. At present only small chunks of these are on-line in a compatible form. The Sky Query execution language is based on SQL, and execution uses their own optimiser to minimise the inter-site data transfers involved. This technology seems to work well on the current data samples, but it is not clear to us how well it will scale up to cover substantial fractions of the sky, which will inevitably involve bulk transfers of information from one server to another over the wide-area network.

4.3.2 The Virtual Sky Project

The Virtual Sky Project has been set up as a collaboration between Cal Tech, Microsoft Research, the Sloan Sky Survey, and Johns Hopkins astronomers. The portal is at virtualsky.org and describes its purpose like this:

The Virtual Sky provides stunning, seamless images of the night sky; not just an album of popular places, but the entire northern sky at high resolution. Virtual Sky has ingested the complete DPOSS survey (Digital Palomar Observatory Sky Survey), with an easy-to-use, intuitive interface that anyone can use .

The interface is indeed easy to use, and the subject headings, for example Popular Attractions and Some pretty things, suggest that the site is aimed mainly at amateur astronomers and interested members of the general public.

The DPOSS (Digital Palomar Observatory Sky Survey) is the principal local resource; the maximum resolution (1.4 arcseconds/pixel) is fine for on-line viewing, but professional users are likely to want images which have not been resampled and with comprehensive metadata, and are likely to find the facilities of SkyView (at GSFC) more appropriate. Other Virtual Sky resources (ROSAT, Hubble deep field, VLA survey, etc.) are provided by links to the sites of these observatories (and a few of the links needed updating).

4.4 Discussion

4.4.1 Use-cases

Some of the systems have query interfaces somewhat like those we shall want to provide in the "VO Portal", especially Vizier, the fancier bits of NED, and the facility in Astrobrowse which allow the concurrent searching of multiple web-sites. However, all the systems have the same basic philosophy: display lots of data and metadata in a web page and give chains of hyperlinks to even more data. They make no attempt to provide consistency in the results from disparate sources, as this would be very difficult with the existing infrastructure.

The sites, notably Vizier, Aladin, and Skyview, which make it possible to search a number of datasets in an integrated fashion have managed this by providing all the data in the right format locally. One of the principal aims of the VO projects will be to provided similar facilities but from federations of data accessed from their original locations.

Most of the use cases were not supported because they involved the technique "do a search, then do something specific with the results of the search". The VO-like archives are not set up to handle the "do something with the results" part, since they only represent the results as web-pages, not as semantically-useful data held for further processing. The exception is the making of synthetic spectra in NED, and this is a specific application - a vertical integration - that has clearly been coded in specially. It's not the kind of processing that a user can set up using a script and separate services at NED.

Some of the use-cases failed because the various archives do not have uniform criteria for selecting objects. In any given query, the selection criteria must either be on quantities that the interface designer coded into the UI, or there must be a free-form interface for specifying other criteria: a query language known to the user. The existing systems don't expose a query language, and their web interfaces only deal with a few quantities.

The use case GetReducedSpectra fails because the systems do not seem to provide reduced spectra. They only deal in images and tables.

The systems don't seem to deal in identified usage. Presumably, this means that they allow less access to data than a given user is entitled to.

In general, the systems reviewed let you look up more easily data that you could get by trawling through paper journals or by using interfaces to individual large archives. They require you either to know what you are looking for at the start (e.g. which catalogues to search) or to be prepared to spend a long time browsing. The output of the search is as for searches in paper collections: text you can read, but not machine-readable data products.

4.4.2 Conclusions

The systems studied here have a wealth of good features, many of which we need to emulate, but we were also able to identify a number of missing features and weaknesses in current systems which the VO alliance needs to address. These include:

  • Searches over distributed resources are important but difficult, because of a lack of agreed standards for queries (both simple and advanced), for metadata, and for the results (both extracts from tables and from images).

  • Resource discovery at present requires expert knowledge - a scalable resource discovery mechanism is needed.

  • These web sites all support interactive queries, but few have any facilities for batching them up, e.g. to retrieve results from a list of interesting celestial positions.

  • The ability to do cross-identifications between catalogues on different sites is important (via the fuzzy-join algorithm) but facilities for this are rare and hard to use at present, and bandwidth may limit what can be done over the network.

  • It is possible to construct services, such as Simbad, Vizier, and Aladin, which are separate but so well-linked that they appear as an integrated system, but these are exceptional and they are all co-located. If services on separate sites could be as well integrated, this would be a good step towards the VO.

  • We need to consider how best to support the study of time-varying and transient phenomena, somewhat neglected at present.

  • These archive sites used a variety of commercial and free DBMS (Sybase, Ingres, Oracle, SQL Server, MySQL, and probably others) as well as some home-grown database systems. Web Services interfaces will be needed for almost all of them.

-- ClivePage - 28 Nov 2002

Edit | Attach | Printable | Raw View | Backlinks: Web, All Webs | History: r12 < r11 < r10 < r9 < r8 | More topic actions
 
AstroGrid Service Click here for the
AstroGrid Service Web
This is the AstroGrid
Development Wiki
This site is powered by the TWiki collaboration platformCopyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback