(4) Virtual Observatory Prototypes
(4.0) Introduction
The idea of the Virtual Observatory arose gradually over a period of time in which astronomical data archive sites continually improved their facilities and user-interfaces, but in a piece-meal and uncoordinated way. Many of us realised that a coherent and planned approach, with standardised interfaces, would permit a radical advance. As part of
AstroGrid's initial programme, therefore, we carried out a short survey of the most advanced existing web sites and software packages to determine the facilities they provided and how they worked. Nearly all of these rely on technical solutions which pre-date Data Grids and the Web Services paradigm, so we did not expect there to be much scope for direct technology transfer, but the facilities reflect the perceived needs of the astronomical community, and we hoped to learn a lot from them about what
AstroGrid needs to provide, and note the strengths and weaknesses of the current solutions.
These investigations were carried out as a joint exercise between our grid technology and database technology teams. It should be noted that they represent a snapshot of practice in the early part of 2002, and that many facilities have changed since then.
(4.1) Current Services, Sites, and Software
4.1.1 Astrobrowse
Astrobrowse is one of the web interfaces provided by the High Energy Science Archive Research Center (HEASARC) at NASA Goddard Space Flight Center (GSFC). Its most advanced facility is a distributed cone search: users can enter the coordinates of an object and a search radius
into a single form and then search a large number
of different on-line astronomical catalogs from around the world. Alternatively one can specify the name of an object, and NED or Simbad will be used to determine its coordinates.
There is a choice of a quick or full search, the latter form giving many
more options and allowing a more selective search of archive sites.
Bandpass, data type, and other keywords can be specified
to refine the selection. One can also select which types of service
to interrogate (e.g. optical or radio), or which individual servers, with considerable flexibility.
Astrobrowse uses AstroGLU software, a development of the GLU system written
at CDS (Strasbourg). This contains a list of URLs of each data archive
service and the parameters each requires, which it uses to generate a customised CGI query
for each of them in turn. Only one or two sites have adopted the conventions proposed by CDS, so the GLU database is essentially a loving compilation of the idiosyncrasies of each site.
Astrobrowse then waits for the various replies to come back. This may be slow if
one selects a large number of servers around the world:
the results page has a side-bar reporting the status of each service, with
an option to refresh this at intervals. In practice it takes only a few
seconds to get results from the most servers, but it may be necessary to
wait for a few minutes for all of them to respond (and at times some responses
never appear). Astrobrowse has not attempted to solve the problem of integrating
the results, which appear in a wide variety of formats, requiring some expertise to understand. This is obviously
a difficult problem, but one which the VO needs eventually to solve (e.g.
with the aid of VOTable and UCDs).
The software is available for download, and a version is also in use at Harvard/Smithsonian Astrophysical Observatory. Our tests were carried out on version 1.7.
4.1.2 Browse/W3Browse
Browse also comes from HEASARC at GSFC: it is essentially a search engine for tabular data, originally designed
for data from high-energy observatories, but now broader in scope,
including many radio and optical catalogues, as well as links to Vizier.
The underlying DBMS is currently Sybase, but some attempt has been made
to keep the software DBMS-independent. Version 6.3 was current when these
comments were made. As with many astronomical archives, the primary search
is by position or object name, resolved using Simbad or NED. Results can be
produced in four formats: plain text, HTML tables, FITS tables, or Astrores
XML.
Although the basic facilities are similar to many other data archives, Browse has two particular strengths:
- It can cross-correlate (join) two or more tables on celestial position
(or date/time of observation). The results of the join can be sorted, or
plotted if the browser is Java-enabled (but this often turns out to be slow).
- Browse also has links to original datasets so that, having found
the required observation, users can download data products from observatory
missions. These two features: joining tables, and downloading data products,
are very valuable, and must form a feature of our VO design.
HEASARC's web-based browse service was for a time known as W3Browse to
distinguish it from the original BROWSE service accessed by
telnet.
The software was originally written in ESOC around 1980 as the interface
to the EXOSAT Observatory's data archive. The service later moved to ESTEC,
then to HEASARC, and versions were subsequently installed at other sites
including LEDAS (Leicester) and MPE (Garching). The original telnet service
is still available at LEDAS, although usage is dwindling. A few of the
useful features of the original Browse have been lost in the web version,
e.g. the ability to save the results of a filtering (select) operation
and then make further selections on that. These features may be beyond
the scope of a simple web service, but certainly ought to be provided by
a data mining service. This is something that our
MySpace concept should be able to address.
4.1.3 CURSA
CURSA is a Starlink package for manipulating astronomical catalogues and tables. It is mostly concerned with accessing catalogues held as local files, but also provides some facilities
for searching remote catalogues. Remote catalogue searches are available
within the GUI-based catalogue browser
xcatview and from the
Unix command-line by using the application
catremote. The only
type of remote search supported by either application is the `cone search'
to find objects within a specified angular separation of a specified central
celestial coordinate. Optionally, the name of an astronomical object may
be given instead of a central coordinate and the SIMBAD or NED name-resolver is used to
replace the object name with the corresponding coordinates. For some catalogues
catremote also allows additional selections on pre-defined
columns (for example, limiting the selected objects in the specified region
of the sky to also lie in a given magnitude range).
CURSA uses exactly the same mechanisms and formats as SkyCat and GAIA
for submitting queries to a remote catalogue and returning the table of
results, and has the same advantages and limitations. The most notable limitation
is that it is only possible to search one remote catalogue at a time.
xcatview's
GUI for searching remote catalogues has a rather different layout to SkyCat's.
catremote is suitable for embedding in scripts as well as
for interactive use from the command-line. Searching elements of the VO
from within scripts which perform specialised, bespoke tasks seems a likely
requirement for the VO.
It is also worth noting that CURSA is based on the FITSIO library written at GSFC, which has FTP and HTTP protocols for data access, so that all the CURSA tasks can access remote tables if they are present on FTP or HTTP repositories in FITS table format. They do this, however, by copying the entire file to local memory or scratch disc. The same facility is built in to the FTOOLS utilities for handling FITS files provided by GSFC. For small tables this is fine, but for large tables bandwidth can be a problem.
The tests were done on CURSA version 6.4. Documentation is provided
in two Starlink documents, SUN/190 and SSN/76.
4.1.4 ISAIA
The ISAIA project, led by GSFC, was intended to develop a number of virtual observatory
concepts, such as the integration of results from queries such as those
sent out by systems like Astrobrowse. Although the project is
no longer active, and those involved are now part of the NVO team. The website
contains several useful documents, but these are now being overtaken by
more recent developments.
4.1.5 MAST
MAST (Multi-mission Archive for Space Telescope) comes from the Space Telescope Science Institute (STScI) and is the optical/UV/near-IR component of NASA's distributed Space Science Data Services and aims to provide integrated access to data from a range of missions/projects, namely: Hubble Space Telescope (HST), Far Ultraviolet Spectroscopic
Explorer (FUSE), International Ultraviolet Explorer (IUE),
Extreme Ultraviolet Explorer (EUVE), Hopkins Ultraviolet Telescope
(HUT), Ultraviolet Imaging Telescope (UIT), Wisconsin
Ultraviolet Photo Polarimetry Experiment (WUPPE), Copernicus (OAO-3), Orbiting and Retrieval Far and Extreme Ultraviolet Spectrograph (ORFEUS), Berkeley Extreme and Far-UV Spectrometer
(BEFS), Interstellar Medium Absorption Profile Spectrograph(IMAPS) (first flight), Tübingen Echelle Spectrograph (TUES), Digitized Sky
Survey (DSS), Guide Star Catalog II (GSCII), Sloan Digital Sky Survey (SDSS), FIRST (VLA radio data), Roentgen Satellite (ROSAT).
MAST offers a series of cross-mission search tools, which
vary from tools directed to specific science cases (e.g. find all data in
MAST archives close to Abell clusters) to a Single Target quick search
interface, where one can enter coordinates or source name (to be resolved
by NED or SIMBAD), and get a list of data available within MAST: e.g. entering
M31 yields the predictable long list of data - one nice feature is that
preview versions of images (in GIF format) are provided. This interface also
enables searches by data type - e.g. checking the "
X-ray spectra"
box returns a link to the top page of MAST's ROSAT site, as well as a helpful
note that HEASARC provide access to a wider range of high-energy data. There
is also the MAST Scrapbook,
which offers users "
representative images or spectra of an astronomical
object" (specified by coordinates or resolvable name, as before) -
basically another way of asking for preview data.
MAST also hosts a series of Prepared Science Products.
Examples of these are the Hubble Deep Field (North and South) datasets,
various UV spectral atlases and the SDSS Quasar Catalog, derived from the Early Data Release of the Sloan Digital Sky Survey.
A page on Data Analysis Software
lists "
some of the data analysis software packages used for the MAST
archived data", mostly comprising IRAF packages written for specific
instruments. Data transfer varies between the different archives in the
MAST system, but is usually either via anonymous ftp or direct downloading
from a WWW browser.
Overall, MAST is a good example of a current-generation archive
site: it provides access to data from a number of sources in a reasonably
coherent and user-friendly manner, and with a reasonable amount of documentation
(the documentation for the SDSS EDR site has improved markedly in recent
months) and a Helpdesk which is staffed Mon-Fri 09.00-17:00 (EST), but it
seems very interactive - there's no obvious batch mode facility, or any such
means of submitting large numbers of queries in an automated fashion.
4.1.6 NED
NASA's Infra-red Processing and Analysis Centre (IPAC) at CalTech provides the National Extragalactice Database (NED). It is built around a master
list of extra-galactic objects for which cross-identifications of names
have been established, accurate positions and redshifts entered to the extent
possible, and some basic data collected. Bibliographic references relevant
to individual objects have been compiled, and abstracts of extra-galactic
interest are kept on line. Detailed and referenced photometry, position,
and redshift data, have been taken from large compilations and from the literature.
NED also includes images for over 700,000 extra-galactic objects from 2MASS,
from the literature, and from the Digitized Sky Survey. NED's data and
references are being continually updated, with revised versions being put
on-line every 2-3 months. In essence, therefore, NED provides facilities somewhat similar to those of CDS but specially only for identified extragalactic objects, and specially adapted to the needs of those studying them.
4.1.7 Querator
Querator is a tool for extracting images of a given region of sky
from image surveys. It was developed by Francesco Pierfederici at the European Southern Observatory (ESO). In a single query it can extract images
of the same region of sky from several different surveys, thus allowing
a stack of images, typically in different colours or wavelength ranges, to
be returned. Currently images from the HST and various ESO telescopes are
available. The region of sky to be extracted can be specified in a number
of ways:
- object name,
- sky box,
- external server search,
- user file upload.
The object name, sky box and user file upload options are all as
would be expected. In the first two cases the user gives, respectively,
an object name or the meridians of Right Ascension and parallels of Declination
defining a region of sky. Additional constraints (exposure time, observation
date, wavelength range, instrument etc.) can be specified to refine the
search. In the user file upload option the user gives the name of a prepared
file containing a list of object names or coordinates. This useful option
allows images to be retrieved for a number of regions in a single operation.
The "external server search" option is more interesting and innovative.
Here the user submits a query to a remote catalogue archive (such as LEDA or the NASA ADC catalogue collection), which
is quite separate from the data centre holding the image surveys, in order
to search one or more catalogues according to an arbitrary criterion the
user has supplied. The remote catalogue archive returns a list of objects
which satisfy the query. Querator takes this list and retrieves images for
all the objects listed.
Access to Querator is solely through a Web interface, which is generally
easy to use. However, constructing the query for the remote archive in
the "external server search" option is complicated. The query is constructed
using the native syntax of the remote service and thus varies between
different services. Querator seems to still be under active development.
The query pages crash
netscape running on a Compaq/Alpha but
are ok on Sun/Solaris (though this could be a bug in
netscape
for all that I know). The surveys currently accessible mostly seem to consist
of pointed observations. However, presumably, there is no reason why Querator
could not access contiguous surveys, such as the DSS.
Querator has a number of features which seem likely to be required in
the VO, including the ability to retrieve a stack of images from several
surveys in one query and something analogous to the "external server search"
option. However, to make the latter easy to use a unified (and simple)
query syntax to specify queries on all the remote catalogues is required.
No version number was given. The tests were conducted on 13 February
2002.
4.1.8 Simbad, Vizier, and Aladin
These interwoven services are provided by the Centre de Doneés de Strasbourg (CDS), the oldest and probably largest collection of astronomical data resources in the world. There are many interconnections between the separate services which make the system easier to use, but make it harder to see which part does
what. Essentially:
- Simbad is principally a bibliographic archive, which
includes information about all papers in the primary astronomical literature
about objects beyond our solar system, including the properties of the celestial objects listed therein. This means that Simbad holds many of the small surveys in the literature (e.g. fields of the 5th Cambridge catalogues of
radio sources) but not the massive data-collections like SDSS. Naturally,
the type of measurements are very varied. Simbad's web interface only
allows queries on a few basic parameters, most of which are biased towards
stellar astronomy. We note that Simbad's object classification scheme needs to be considered in our ontology efforts
- Vizier is billed as a "catalogue of catalogues" which
underplays what it can be used for. There are two main uses: selecting
catalogues (by criteria such as "contains QSOs") and listing the descriptions
of those catalogues; selecting objects from the union of all the catalogues
by various criteria. That is, Vizier allows both metadata and data searches.
The fact that these two modes are driven from the same interface-page makes
Vizier harder to use than it need be. The "union of all catalogues" seems
to mean the catalogues absorbed into Simbad plus major external data-collections
such as 2MASS.
- Aladin is an image display with advanced overlay-features.
The Vizier and Simbad operations can display results by returning a web
page in which Aladin runs as an applet with the data preloaded. Alternatively,
Aladin can run as an application and can send queries to Simbad and Vizier
in response to user actions. Aladin allows overlay plots from many catalogues
to be stacked up, and provides good controls for manipulating the stack
(e.g. controlling visibility of particular planes).
- The bibliographic database lists the papers from which
data were taken for Simbad. This makes it excellent for use with Simbad
and dangerous to use for any other purpose due to the specialized pre-selection.
The service also holds on-line abstracts of recent papers in selected
journals.
The popularity of these services is shown by the existence of several mirrors: Simbad has a mirror in the USA, while there are already half dozen mirrors of Vizier, including one in the UK (Cambridge). Simbad is also the principal name resolver (translator from celestial object name to coordinates) used by many other sites around the world: it is an obvious candidate for conversion to a
Web Service using SOAP/WSDL.
4.1.9 Skyview and Skymorph
Skyview is another service of HEASARC at GSFC. It describes itself in these terms:
SkyView is a Virtual
Observatory on the Net generating images of any part of the sky at wavelengths
in all regimes from Radio to Gamma-Ray.
The SkyView server contains copies of images of the sky taken in a wide
range of wavebands from radio to gamma ray, mostly (perhaps all) stored
as FITS files. The SkyView software, written at GSFC, selects and overlays
these images, giving results in one's chosen resolution, and it automatically
handles rotation, precession, coordinate transformations, and pixel re-sampling.
The results can be seen on the screen, or a FITS image can be downloaded
from an FTP area in a number of formats including FITS, TIFF, GIF, and
PostScript?.
There are actually five different interfaces: for the non-astronomer, basic,
advanced, Java, and X-windows. The latter is regarded as obsolescent, now
that Java can provide the required controls. More advanced image options,
such as changes to color tables, overlays on extent images, image rescaling,
zooming, etc. require a Java-enabled browser.
Advanced options include the ability to overlay data from two or three
different data sources, perhaps mapping each to a different primary colour,
producing a pseudo-colour result. This is even possible for those using
8-bit displays.
SkyView? can also implement boxcar averaging of an image,
to obtain a smoothed result.
There is also a batch option, with Perl scripts which can be downloaded
and run on a Unix/Linux system. The software is freely available for download.
The software, written by Tom McGlynn and his team,
is all available for downloading and external use.
The comments here apply to Web Version 4.1, with version 3.2 of their
Geometry Engine. A new interface is currently (2002-02-01) on beta-test
and allows the interface to be customized; a few functions did not seem to
be working correctly when tested.
Overall the SkyView facilities for image selection and display are so comprehensive, and so well covered by the documentation, that it is hard to think of features still lacking.
But it must be noted that facility has been provided entirely using
local storage, by taking copies of datasets produced elsewhere and, where necessary, reformatting them to suit SkyView. It was, apparently, decided when Skyview was designed that the Internet did not have enough bandwidth to allow the retrieval of images from remote sites.
SkyMorph specialises in searches for variable, moving or transient objects.
It provides convenient access to optical images and catalogs generated by
the Near Earth Asteroid Tracking (NEAT) program. These include more than
67,000 CCD images covering a large fraction of the sky. The same region
is typically observed several times each night, and is revisited on monthly
and yearly timescales. SkyMorph appars to be based on Skyview, and seems to have few unique features of VO importance, but it is one of the few services which supports the time dimension, which
AstroGrid must not neglect.
4.1.10 Skycat, GAIA, and JSkyCat
SkyCat is an image display tool developed by Allan Brighton and colleagues
as part of the ESO VLT project. GAIA is an enhancement of SkyCat by Starlink, which has
added numerous astronomical analysis facilities, including: astrometric calibration,
automatic object detection and aperture, optimal and surface photometry.
Both SkyCat and GAIA are mostly concerned with accessing local files.
However, they both contain some limited facilities for accessing remote catalogues
and image surveys. GAIA's facilities in this area are identical to SkyCat's
and the following notes apply to both applications.
SkyCat and GAIA can access a reasonably extensive remote collection of
standard astronomical catalogues and a few image surveys, principally the
HST Digitised Sky Survey (DSS).
The principal purpose of remote catalogue searches in SkyCat and GAIA is
to find objects which overlay an image that has already been displayed by
the application (though searches can be made which are not connected with
any image). Consequently, the only type of remote search supported is
the "cone search" to find objects within a specified angular separation
(or `radius') of a specified central celestial coordinate. Optionally, the
name of an astronomical object may be given instead of a central coordinate
and the SIMBAD or NED name-resolver is used to
replace the object name with the corresponding coordinates. For some catalogues
additional selections are also supported on pre-defined columns. For example,
it may be possible to select objects which lie in the specified region of
sky and which also lie within a given magnitude range.
Regions of sky can be extracted from image surveys by specifying the
central coordinates and size of the field required. Again, optionally, an
object name can be substituted for the central coordinates.
Skycat and GAIA have a convenient user-interface which is well-integrated
with the rest of the display functions. Retrieved objects are automatically
plotted on top of a displayed image if they overlay it. It is easy to highlight
a given object in both a table of the selected objects and in an image overlay
plot.
The list of remote catalogues and image surveys available to Skycat and
GAIA is held as a text file. This arrangement is good in that the list
is not hard-wired into the code and can be customised, but is bad in that
the file has to be edited manually, rather than maintained automatically
by a `resource register' of the sort that we have been discussing.
Queries are submitted and results returned using HTTP protocols. The
query format is somewhat restricted (and is similar to, but not identical
with, the ASU query standard). Tables are returned in the Tab-Separated Table (TST) format,
which is somewhat deficient in catalogue metadata, though it does contain
enough information to define how objects are to be plotted on overlays (ellipses
etc). Images are returned as FITS files.
A VO client or portal would need to provide at least all the remote access
facilities of SkyCat and GAIA. Their principal disadvantage is that they
can only search one catalogue at a time.
The version tested was GAIA version 2.6, derived from SkyCat version 2.4.
On-line documentation for SkyCat is available from its
home page at ESO.
JSkyCat is a re-implementation of Skycat (above) in Java. It was also
developed by ESO. It has similar functionality
to the original Skycat, but has fewer features because it is still under
development. JSkyCat is written using the JSky Java class library, elements
of which are also used in the Gemini Observation
Planning Tool.
The remote catalogue and image survey access facilities in JSkyCat are
essentially identical to those in SkyCat: it provides the same functionality,
uses the same mechanisms and formats for submitting queries and returning
tables of results and has the same advantages and disadvantages.
The version tested was JSkyCat 1.2; on-line documentation is available
from the ESO web pages.
4.1.11 Starcast
Starcast, also from STScI, is MAST's prototype
implementation of Astrobrowse, described above.
The Starcast implementation currently uses a Perl interface to the profile database, not the CDS GLU system as used in the original Astrobrowse
prototype at HEASARC, but is intended to migrate to using GLU at some point:
this will mean that the Starcast administrator will not have to input the
profiles manually, as is currently the case.
The Starcast query
form allows the user to search for data around a sky position or an
object with a name that can be resolved into a sky position by NED or SIMBAD. The user then specifies the
Bandpass (running from radio to gamma ray) , Data Source (with choices
Any,
Derived,
Observations,
Pointed,
Proposal,
Survey,
Survey Data), the Data Type (
Any,
Catalog,
Image,
Images,
Other,
Spectra,
Spectrum,
Time-series), and Location (
Any, or a selection from
a list of about 30 international data centres) and sets the query running.
The browser moves to a new page, with two frames, one of which lists the
specification of the query, and the second gives links to the services which
might have data satisfying the query: next to each of these links is an icon
showing whether the search on that resource is running, has completed successfully
or has crashed, and these may be updated by pressing a
Check status
button at the top of the frame.
The implementation of this service seems incomplete, in some sense. For
example, a test query asking for EUV data from
Any Source of
Any Type and at
Any Location within 10 arcmin of
10 00 00 -10 00 00 returned a number of links, one of which was to the IMPReSS interface at the NASA ADC at Goddard. Clicking on that link took me to a WWW page generated within the IMPReSS system, which
listed the sky position for my search and presented me with a list of archives
(not just EUV, but also X-ray and optical) with data around that position,
asking me which I wanted to choose. Clearly, since I'd already specified
my query on the Starcast WWW form, I should have been taken one stage further
within IMPReSS. This is something of minor quibble, for what is, after all,
just a work-in-progress prototype implementation, but it does highlight
the difficulty of fitting a top-level query interface on top of existing
data centres, each of which provide access to their archives in different
ways.
4.1.12 Starview
StarView comes from the Space Telescope Science Institute (STScI) and its blurb says that "
StarView is an astronomical database
browser and research analysis tool. Developed in Java, StarView provides
an easy to use, highly capable user interface that runs on any Java enabled
platform as a standalone application." Download and installation (under
Windows NT) was remarkably simple, and the Java GUI is very nice. Starview
can be used to search for data in MAST archives, examining the calibrations
used for a particular dataset, and look at proposal information relating
to past HST projects.
Downloading of data through Starview requires registration
with the
STScI? archive, and is performed either by leaving the results file
in an anonymous ftp site, or by ftping them to the user's machine: the
latter requires supplying the user's password, and this is very unpopular
with some system managers. One nice feature is that you can track the progress
with your query on a WWW site.
Queries are defined by starting with a form (a number are
provided, as templates for searching particular archives or making particular
kinds of query) and then the user adds
qualifications to narrow
the search. Queries can be written out as SQL, which is nice, and there is
also a
Cross-Qualifier feature, which allows the results of one
query to be used as input to constructing a second: this seems a very useful
feature, but the instructions are not clear enough to enable a user to use
this option at a cursory reading.
The results of a query are listed in another GUI, and datasets
can be selected from that window for futher operations - e.g. previewing
images or spectra, looking in ADS for references known to have resulted from
that HST proposal, overlaying the instrument footprint on a DSS image -
while the list itself can be exported as an ASCII file. One very nice feature
is that the list of returned datasets can include proprietary ones, for
which the date of public access is listed: slightly annoyingly, one has
to remove those datasets from the list manually before asking to retrieve
the data...surely a better default is not to include them, and then only
the PI (who can access them) would have to do anything. A variety of data
types can be retrieved - it is interesting that one can retrieve data quality
information and/or observing logs, in addition to the data themselves.
All in all, this is a very interesting tool, displaying much
of the functionality that one would require for the VO. As with the MAST
WWW interface, this is still very interactive, but it does have the advantage
that one can store and reload queries one has formulated interactively using
the GUI. Again as for the MAST WWW interface, there is no description of
the technology used, beyond Java.
4.2 Testing CDS and NED with use cases
In order to find out what these services can do, we tried to use them to do the Astrogrid use cases, but found that only a very limited subset of what we wanted to do was currently possible. The section names in the following are the Wiki-names of the use cases in the VO Wiki-web.
4.2.1 FindQSOsByPosition
Vizier and Simbad can do the main flow of this use case, using Aladin
as the display tool. Ironically, Aladin itself can't make the necessary
selection. None of these tools can merge the tables of results.
NED can do some of this work. It can't select on radius from the search
centre, but it can select on ranges of RA and dec., which is almost as good.
It understands "QSOs" and "QSO clusters" as selection criteria. The plotter
("skyplot") from NED is poor (line graphics only) and is non-interactive:
there is no way to select objects on the display and get to their details
from the catalogue.
4.2.2 GetLiteratureReference
This feature is available in Simbad. Results of searches carry hyperlinks
to entries in CDS' bibliographic service. However, only selected references
are shown (to explain where the Simbad data came from, not to refer to the
science). It is possible to query the bibliographic service directly, but
the number of references returned is surprisingly small (e.g. 7 for a
search on NGC1068).
NED allows one to query the database of abstracts directly by object name.
This finds many (all?) references (e.g. 1249 for NGC1068).
4.2.3 GetReducedSpectra
There is no obvious a way to get any actual spectral pixel-data from any
of these systems except one small part of IPAC. The SWAS mission, available
through IPAC, serves spectra as either web graphics or in FITS files.
4.2.4 InstrumentFootprint
None of the systems seem to allow an instrument footprint as a search
area in a query.
Aladin allows a user to draw one of a set of limited footprints as an overlay
on an existing plot. If one then measures the footprint in Aladin one can
get a search radius that encloses the footprint, and can search on that radius
in Vizier. This allows the work to be done manually.
4.2.5 ObservingProposalCheckForData
None of the systems do this use case. There are no links to observation-proposal
systems.
4.2.6 PhotometrySearch
None of the systems appear to cover this case, and there are no references
to software for interconverting magnitudes and fluxes except in NED, which
is trying to go in the opposite direction, from photometry to coarse spectra.
4.2.7 PosteRestante
None of the systems even attempt this except for 2MASS (accessed through
IPAC) which has a batch system for producing image extracts.
4.2.8 SelectAstrometricStandards, SetImageWCS
None of the systems can do these cases as written. There is no support
for actually doing the astrometric fit, nor for plotting the residuals on
the fit.
Vizier and Simbad can do most of SelectAstrometricStandards, but they
cannot select the "best" catalogue out of the many available. Aladin doesn't
help with this case, since the idea is to automate the process, not to
do it interactively.
NED is not very helpful, since stars are needed, not extra-galactic objects (but QSOs may be valuable in future).
4.2.9 SyntheticSpectra
NED can do this very nicely, but only for one object at a time. The initial
selection of data is not quite as general as in the use case. The plot
is done as a web graphic displayed in a web browser.
Simbad, Vizier and Aladin can't do this work. It isn't even straight-forward
to extract the photometry so that one can do it manually.
4.2.10 SelectionOfTrustedCatalogues
None of the systems allow this work to be done as stated.
NED allows references to be looked up, but not using bibcodes.
Vizier, Simbad and Aladin do not support bibcodes as a search term, but
they do return bibcodes in the results of their results. The CDS bibliographic
service does good searches by bibcodes.
None of the software helps with handling the list of data and bibcodes
as suggested in the use case. Aladin could be used to display the objects
in the user's catalogue and the user could then cross them off as the
bibcodes were checked by drawing into the graphics overlay.
4.2.11 Use cases involving authorization and authentication
None of the systems inspected here deal with these issues.
4.3 Recent VO Prototypes
The facilities examined in section (4.1) were those which existed before the
AstroGrid project started and our survey of them was completed in early 2002. Here we present information on newer VO-related projects.
4.3.1 Sky Server
The site
skyserver.sdss.org provides public access to the data products from the Sloan Digital Sky Survey (SDSS). About 80 GB of data (14 million objects) from the first year's scans are currently available. The original plan by Johns Hopkins University was to use an object-oriented DBMS (Objectivity/DB) but various problems with performance and software support led them to switch to a purely relational solution: Microsoft SQL Server. This transition seems to have been remarkably smooth, but the assistance of Jim Gray, a Microsoft "Distinguished Engineer" and manager of Microsoft's Bay Area Research Center (BARC) surely had a lot to do with this.
The skyserver database is hosted in Fermilab, but it is managed jointly from BARC and JHU. The structure (schema) of the set of relational tables was designed after a set of 20 typical queries was defined by Alex Szalay and his colleagues at JHU: this represents, then, another design which has been
use-case driven (but here without the aid of UML).
The web server is based on Microsoft's Terraserver, and uses many other Microsoft products and standards such as IIS and Active Server Pages, but considerable efforts have been made to make the resulting web-site accessible from browsers of all kinds. In general the clients only need Javascript, but there is one applet,
SkyServerQA?, which can be downloaded. This makes it very portable, but some of the Javascript appears to put a heavy load on a PC, making other screen updates noticeably slow when the Skyservery screen is visible.
The SDSS scientists are especially interested in galactic clustering and large-scale structure of the universe. To make spatial queries run quickly they created an index based on the HTM (Hierarchical Triangular Mesh). Their
SkyQuery service is designed to support spatial joins with two other large catalogues: 2MASS (at Cal Tech) and the VLA FIRST survey. At present only small chunks of these are on-line in a compatible form. The Sky Query execution language is based on SQL, and execution uses their own optimiser to minimise the inter-site data transfers involved. This technology seems to work well on the current data samples, but it is not clear to us how well it will scale up to cover substantial fractions of the sky, which will inevitably involve bulk transfers of information from one server to another over the wide-area network.
4.3.2 The Virtual Sky Project
The Virtual Sky Project has been set up as a collaboration between Cal Tech, Microsoft Research, the Sloan Sky Survey, and Johns Hopkins astronomers. The portal is at
virtualsky.org and describes its purpose like this:
The Virtual Sky provides stunning, seamless images of the night sky; not just an album of popular places, but the entire northern sky at high resolution. Virtual Sky has ingested the complete DPOSS survey (Digital Palomar Observatory Sky Survey), with an easy-to-use, intuitive interface that anyone can use .
The interface is indeed easy to use, and the subject headings, for example
Popular Attractions and
Some pretty things, suggest that the site is aimed mainly at amateur astronomers and interested members of the general public.
The DPOSS (Digital Palomar Observatory Sky Survey) is the principal local resource; the maximum resolution (1.4 arcseconds/pixel) is fine for on-line viewing, but professional users are likely to want images which have not been resampled and with comprehensive metadata, and are likely to find the facilities of SkyView (at GSFC) more appropriate. Other Virtual Sky resources (ROSAT, Hubble deep field, VLA survey, etc.) are provided by links to the sites of these observatories (and a few of the links needed updating).
4.4 Discussion
4.4.1 Use-cases
Some of the systems have query interfaces somewhat like those we shall want to provide in the "VO Portal", especially Vizier, the
fancier bits of NED, and the facility in Astrobrowse which allow the
concurrent searching of multiple web-sites. However, all the systems have
the same basic philosophy: display lots of data and metadata in a web
page and give chains of hyperlinks to even more data. They make no attempt to provide consistency in the results from disparate sources, as this would be very difficult with the existing infrastructure.
The sites, notably Vizier, Aladin, and Skyview, which make it possible to search
a number of datasets in an integrated fashion have managed this
by providing all the data in the right format locally. One of the principal
aims of the VO projects will be to provided similar facilities but from federations
of data accessed from their original locations.
Most of the use cases were not supported because they involved the technique
"do a search, then do something specific with the results of the search".
The VO-like archives are not set up to handle the "do something with the
results" part, since they only represent the results as web-pages, not as
semantically-useful data held for further processing. The exception is
the making of synthetic spectra in NED, and this is a specific application
- a vertical integration - that has clearly been coded in specially. It's
not the kind of processing that a user can set up using a script and separate
services at NED.
Some of the use-cases failed because the various archives do not have uniform
criteria for selecting objects. In any given query, the selection criteria
must either be on quantities that the interface designer coded into the
UI, or there must be a free-form interface for specifying other criteria:
a query language known to the user. The existing systems don't expose a
query language, and their web interfaces only deal with a few quantities.
The use case GetReducedSpectra fails because the systems do not seem to
provide reduced spectra. They only deal in images and tables.
The systems don't seem to deal in identified usage. Presumably, this
means that they allow less access to data than a given user is entitled
to.
In general, the systems reviewed let you look up more easily data that
you could get by trawling through paper journals or by using interfaces
to individual large archives. They require you either to know what you
are looking for at the start (e.g. which catalogues to search) or to be prepared
to spend a long time browsing. The output of the search is as for searches
in paper collections: text you can read, but not machine-readable data products.
4.4.2 Conclusions
The systems studied here have a wealth of good features, many of which we need to emulate, but we were also able to identify a number of missing features and weaknesses in current systems which the VO alliance needs to address. These include:
- Searches over distributed resources are important but difficult, because of a lack of agreed standards for queries (both simple and advanced), for metadata, and for the results (both extracts from tables and from images).
- Resource discovery at present requires expert knowledge - a scalable resource discovery mechanism is needed.
- These web sites all support interactive queries, but few have any facilities for batching them up, e.g. to retrieve results from a list of interesting celestial positions.
- The ability to do cross-identifications between catalogues on different sites is important (via the fuzzy-join algorithm) but facilities for this are rare and hard to use at present, and bandwidth may limit what can be done over the network.
- It is possible to construct services, such as Simbad, Vizier, and Aladin, which are separate but so well-linked that they appear as an integrated system, but these are exceptional and they are all co-located. If services on separate sites could be as well integrated, this would be a good step towards the VO.
- We need to consider how best to support the study of time-varying and transient phenomena, somewhat neglected at present.
- These archive sites used a variety of commercial and free DBMS (Sybase, Ingres, Oracle, SQL Server, MySQL, and probably others) as well as some home-grown database systems. Web Services interfaces will be needed for almost all of them.
--
ClivePage - 28 Nov 2002