Database access through web services and the Grid: recent progress
There has been considerable progress in making databases
available on the Grid as web services.
- A draft specigfication of the interfaces is available.
- Prototype code to those interfaces exists.
- There is a funded project to develop production-quality code.
- The movement to standardize these interfaces is proposed as a working group for GGF.
Who?
The database work is a UK initiative. It is being steered
by the Database Task Force (DBTF) of the e-Science core
programme.
The funded project is called OGSA-DAI, where DAI
stands for Database Access and Integration. It is funded by
the core programme and gets matching "funding", in the form of
seconded developers, from IBM's lab at Hursley. I understand
that DBTF somehow steers OGSA-DAI but the exact relationship is
unclear to me.
The working group at GGF is yet to be authorized by GGF.
When it comes into being, it will be an open, international
group for discussing the proposed interfaces and reference
implementations in prparation to them become a grid standard.
I understand that the WG will meet three times a year, at
the GGF conferences, and essentially anybody in the grid
community can ask to table documents and comments at those
meetings.
What?
The
specification
is available at the
NeSC web-site. The available paper is dated
1st February 2002, but I suspect that an update is due soon.
The interfaces provide low-level access to individual databases
via web-services of a class called
DatabaseService . By low-level,
I mean that there is almost no abstraction of the form of the database.
To make a query, the software calling the web service has to first find
out the schema of the database and then must phrase the query to match
the schema. Abstraction, such as translating operands from UCDs to
names of actual columns, has to happen in the caller, not in the
web service.
The basic operations are query, update, bulk-load and schema-update.
There is also an interface to allow transactions. Output of data
selected in queries is quite sophisicated: it uses a separate,
dynamically-generated web-service (a
DeliveryService ) which can use
special transports like
GridFTP and can deliver data asynchronously.
The fine details of the interface are still quite fluid. In many
cases, DBTF is still debating the underlying semantics and trying
to understand how the interfaces best map to established techniques
for RDBMS.
The interfaces are designed to work with any kind of database.
Notably, they are expected to work with RDBMS using SQL and with
native-XML databases using XPath and XQuery. Presumably, the interfaces
could be made to work with object databases.
There is already a partial implementation for an XML database (Xindice,
by Apache) using XPath and possibly XQuery. This is from Rob Baxter's
team at EPCC. At present, this presents only web services and doesn't
do authentication with GSI or authorization. It's independent of
OGSA so far, but this will change.
An implementation for relational data (for any RDBMS
supporting the JDBC interfaces) is expected soon from IBM Hursley.
Both these reference implemenations will be developed to beta-test
stage during Spring and summer of 2002.
The interfaces are specified independently of OGSA at present, but many
of the features cover the same ground. The reference implementations
are expected to use OGSA features explicitly. The final specification
will presumably be for Grid services rather than just web services.
Where next?
These are the specifications and products of which
AstroGrid is
supposed to be an early adoptor. DBTF and OGSA-DAI will be seeking
serious feedback when their reference implementations reach the beta
stage.
In the mean-time, to give us more time to react, I have arranged to
borrow a copy of the alpha version of the XML implementation from
EPCC. This is given on the understanding that there's no support
yet and that all details may change. I intend to set up a
demonstration of the
DatabaseService that
AstroGrid members can
try out. Suggestions for test data are welcome, but I'd thought to
load something relating to our
ResourceCatalogue as a read-only
database.
In the summer,
AstroGrid needs to look closely at the new facilities
and report back to DBTF. By extension, we seriously need to think
about the costs of fitting our system to OGSA-DAI. More prototypes
and demonstrations are indicated, especially of the RDBMS version.
When?
(These dates are from my rather-confused notes on meetings at
NeSC
and may not be accurate.)
- Alpha "escape" of XML implementation: start of May.
- Proper Alpha release of XML implementation: end of May.
- Alpha release of RDBMS implementation: start of July.
- Statement of contents of final release: middle of May.
- Beta releases: about two month after alpha releases.
- Possible training course at NeSC: October.
Alpha released would go to early adoptors only. Some papers concerning
this work will go forward to GGF5, so comments from
AstroGrid before then
wpuld be timely.
--
GuyRixon - 22 Apr 2002
Thanks for doing that, Guy, your notes are more complete than mine in most areas. My notes have the dates for the public release of the OGSA-DAI software as:
- XML databases: 19th July
- Relational databases: 1st Sept.
But these are, of course, just estimates based on current progress.
-- Clive Page - 22 Apr 2002.