AstroGrid Architecture: Thesis
submission to AGOC meeting, 30-Jan-2003
Tony Linde
AstroGrid Project Manager
University of Leicester
Preamble to AGOC
AstroGrid does not yet have a documented architecture. It was
supposed to have been completed by the end of 2002 but other activities and
lack of resources meant that this was not done. The project has added a task to
its Iteration 01 (Jan 01 - Mar 31, 2003) plan to complete and document the
architecture and this task has been assigned to the project's Technical Lead,
Keith Noddle.
In this document I will draw together a number of existing documents
from the
AstroGrid wiki website
1. These
will demonstrate that, far from having no architecture, the project has a well
developed and consistent
idea of an architecture, both in
approach and content. I do not contend that this should stand as a replacement
for the architecture; we still need the architecture and are committed, now
that resources are available, to getting it done.
I would
propose, however, that this thesis
demonstrates sufficient consistency of vision and our track record demonstrates
sufficient ability and control over the project that the AGOC should recommend
that the GSC release the full complement of funding to the project so that we
can proceed with recruitment and systems development.
Deliverables to date
We have dealt with
AstroGrid's concrete deliverables in much detail
over the past year, in particular the 'Progress Against Goals' document
presented to the last AGOC meeting and its expanded version inserted into the
Phase A Report
2. In addition, as part of
the project control process we have implemented, specific short-term goals were
set and reported on for each work package during Phase A
3.
Changes
Since producing the Phase A Report, 'Progress Against Goals', a few
changes need be noted:
The one item of the deliverables listed that has not been delivered
is the Architecture document (sections 11.2.3 and 11.4.1).
In section 11.2.6, it was stated that we had failed to provide a
working grid. By the end of Q5, however, Guy Rixon and Patricio Ortiz had
demonstrated a web service based grid working between Leicester and Cambridge,
accessing data on a Cambridge server
4.
Although, in section 11.3.7, we committed to investigating and
possibly using OGSA-DAI technology, there were significant delays in its
development, during which our enthusiasm waned. We have, however, once again
accepted that we have a role to play in the UK e-Science programme as well as
in UK Astronomy, and have assigned two people to work on developing the
necessary interfaces with OGSA and OGSA-DAI projects and technology so that
they can be incorporated into
AstroGrid as soon as feasible.
Science
Although
AstroGrid is primarily a technical project, its goal is to
make science easier and more effective for astronomers. We have documented our
approach to the science in the Phase A Report, 'Science Analysis Summary'
5. The use of science as a driver for
formulating the system architecture proved highly effective; the 'AstroGrid
Ten' key science drivers
6 provide a
constant check for us at every stage of the project that we are producing a
science
enabling product.
NOTE: I presented this 'scientific' extension to the Unified Process to a forum of e-Science project managers at NeSC in December 2002 where it was well received.
Component model
AstroGrid has been envisaged as a component-based product from the
very earliest days, and, to a large extent, the identified component blocks
have not changed much, though we have expanded the definition of individual
components much more. (
One of the reasons we need a documented
architecture is that these details are spread throughout the project
wiki.)
The latest incarnation of the component model is:
Although this looks a simplistic and relatively meaningless diagram,
it is a measure of the depth to which project members understand the
architecture that we were able to document how each of these components would
operate and how they would interact in a two-day meeting
8.
Phase B milestones
The component model is well enough understood to enable us, with a
fair degree of confidence, to provide the AGOC with a list of milestones
(aligned to Iteration end dates) and deliverables. At the end of each
iteration, the AGOC can measure progress by determining whether each aspect of
a component has been delivered as specified.
VO vision
Our approach to building a VO is predicated upon our
vision of what a VO should be:
- component based
This is not simply a reference to the fact that a VO might be
built of components itself but that a user should be able to select components
from several VO developments in order to achieve their scientific goal.
- not a single system
The VO should not be one or a few interoperating systems. It
ought to be possible for anyone to create software which links to VO resources
and therefore 'looks like' a VO.
- open to all
Anyone should be able to make use of the VO and anyone should be
able to contribute resources - whether data or services - to the VO.
In a way, we see the VO as equivalent to the web. There is no single
system or organisation which owns it. It arises from many resources which
implement a few simple standards.
In the following sections, I will very briefly describe each of the
components that
AstroGrid intends to add to the VO and allude to the iteration
deliverables specific to each component.
Portal
The portal will be a server-based web-delivered component through
which all VO user interfaces are delivered. Each component which requires user
interaction will do so through a
portlet. The portal will
allow the user to select and arrange portlets on one or more web pages
11. A portlet will conform to certain
xml-based protocols so that software can interact with the user through the
portal. Thus, any developer can add their own product's functionality into a
portal simply by wrapping it in a portlet.
In the first iteration, we will develop a simple version of the
portal with general portlets for logging in and providing news and a specific
one for submitting a query to the registry. The portal will be further
developed in later iterations, mainly through the provision of additional
portlets. Finally, a facility will be provided to enable other developers to
add their own software to the portal.
Community
This will allow a group the ability to construct an online
community with individuals and groups. A resource centre can then assign
permission to use its resources (data sets, services etc.) to one or more
groups within a community instead of having to name the individuals themselves.
Within a community, the administrator of the community can assign rights to
individuals and groups, including the right to add members and create groups.
AstroGrid will base its community design on the Globus CAS product and
standards
10.
The management aspects of a community will be provided in a portlet
built in Iteration 02. At a later stage we will add the ability to exchange
information between two communities (by which time we hope to have some
agreement within IVOA on how to do this). Other aspects of user authorisation
will be handled under other components.
AstroPass
The AstroPass is our name for a central server which will store
user credentials, in much the same way as the Microsoft Passport scheme
12. Initially, we will simply allow a user
to identify themself with a username/password combination but later will accept
the upload of user certificates. A user can determine how much or how little
information is exchanged with other VO portals.
Workflow
This component will provide the core functionality of the VO,
enabling the user to construct complex workflows, adding queries and data
analyses, uploading and downloading data and algorithms, rendering the results
as tables or images etc. We will provide a Visio-style interface
15 for the user to drag and drop
components to a workflow, with the workflow engine being intelligent enough to
determine when the output of one component does not match the input of another
and to suggest a translator. Functional flows may be forked and joined under
specific conditions. At points, the user can insert breakpoints where the flow
will be interupted to allow the user to undertake inspection of the
intermediate data or carry outsome manual process. As well as constructing
complex workflows, the user may also simply enter single queries to registries
or datasets.
In the first iteration, the user will be able to enter a simple
interactive query against the registry (probably no more than '_return all datasets which contain X-Ray data_'). Later iterations will add
the ability to construct a workflow, initially with only simple dataset queries
and data movement but later allowing the submission of more complex
jobs.
Job Control
This component allows the user to inspect the status of a job which
has been submitted and, if desired, change the run paramenters of that job.
Initially we will develop a simple job monitoring portlet but willl add the
ability to interupt and change jobs later.
Registry
In
AstroGrid, the registry will be fine-grained: so will contain
detailed information about resources, allowing the user to locate a specific
resource without queries having to be sent to the resource centres themselves.
In the first stage, we will store simple dataset information (wavelength,
number of entries, data table schema) and allow simple queries. This will
expand as we add more and more data and types of data and features to manage
and query the registry.
Our original intention was to research the possibility of creating
an ontology-based registry, so one that could understand the context of query
and process it accordingly. However, time and resource constraints have led to
this being dropped from the project.
In the first iteration, we will define a schema for the registry
and a simple query protocol. This will be done in parallel with the development
of IVOA standards and protocols through a working group which I have
volunteered to lead.
MySpace
MySpace will provide users and components with a virtual space in
which data can be temporarily and permanently stored. The end goal is that the
user need not know where data are stored but will be able to view a list of all
of them, organised in some folder structure. Ultimately, we expect that this
will evolve into something like the EBI's repository for published data, so a
user might publish an article in a journal along with a URL to where the data
might be located. That URL would be location independent so that the data could
be found long after the original server hosting it has passed to silicon
heaven.
Initially, we will provide MySpace as a single and specifiable
location but will evolve this into a multi-site, publishable capability. In a
later iteration, we will provide a
MySpace Explorer, with
which all a user's data can be listed no matter where it is stored nor what
format it is stored in.
Data Centre
This component will present a standard interface to all access to a
resource centre (which would be a better name for it). Ultimately a piece of
software may get a handle to access a dataset or service directly but data
centre may choose to channel access to all its resources through such an
interface. In the
AstroGrid VO, this will be the case as we expect most data
centres will initially want this sort of monitoring and control.
Initially, this component will simply allow access to a dataset.
Later additions will allow for data to be routed elsewhere and for data
policies to be implemented (checking against community user or group
permissions). Fnally, some warehousing and intensive data mining facilities
will be provided.
Dataset Access
This component will simply take a standard query from the data
centre, translate it into the form appropriate for the dataset and execute it,
returning the results to the data centre.
This simple functionality will be provided in the early iterations.
At a much later stage, the ability to create a warehouse for storing query
results will be added and the ability to run the user's own code to extract
results.
Visualisation
Owing to time/resource constraints, we will not be undertaking
research into server-based, interactive visualisation as originally intended.
We do intend to provide the option for sending data to a tool which generates
an image from a set of data and loads it into the appropriate part of a web
page. We will also provide links with one or two desktop visualisation tools
and will publish documents which allow any other tool provider to adapt their
tool similarly.
Astronomical Tools
We will wrap a number of existing astronomical tools, those
considered most essential to creating workflows which fulfill our key science
goals, so that they may be added into a workflow or executed directly from the
portal.
AstroMQ
This is still a speculative component. It was felt by some (mainly
me) that some form of asynchronous message queue facility would benefit
communication between components. This was mainly based on previous workflow
experience. Whether this component survives into later iterations is still
debatable. If so, we will likely choose simply to implement one of the existing
open source messaging tools.
Development approach
I'm sure the AGOC members are sick of hearing about the iterative and
incremental approach of the Unified Process. What I would like to highlight
here, though, is the incredibly dynamic nature of the field in which we are
working.
In the Virtual Observatory sphere,
AstroGrid is one of the founder
members of the IVOA and the only VO project in the world which has started
development
9. At the moment, only one
standard has been defined for VO interoperability: VOTable. There are likely to
be many more and
AstroGrid must implement them if it is to provide a working
product for UK astronomers.
AstroGrid members are very active in the forums in
which these standards will be determined so we hope to minimise any impact on
the project.
In the Grid world, Globus, Microsoft and IBM are defining what a grid
is and how it will operate. About the only sure thing at this stage is that it
will be web service based. We have 'translated' a number of grid standards (eg,
the CAS idea for authorisation
10) and are
tracking the rest to ensure that our architecture and designs are not too out
of step.
In such an environment, the only
sane approach is to
adopt a Just-In-Time philosophy. Decisions on technology, standards etc will be
delayed until we need to make them. This is perfectly in line with the
incremental approach. So, although we will document our understanding of the
architecture, it will be detailed where we define use cases, more high level
for the sequence diagrams which document sets of use cases, and will only in
rare instances define the object model for an area. Design at the level of
object model and collaboration diagrams will only be undertaken during the
iteration in which those use cases are realised.
Organisation
The development of code in each iteration will be carried out by
one or more workgroups, each led by a workgroup leader, all under the control
of the Technical Lead, Keith Noddle. The goals for each iteration are reviewed
by the TSP
13, a group of long-term
members of the project, one from each institute (plus the Project Manager and
Project Scientist), combining technical and scientific skills. It is this group
which determines the use cases to be realised each iteration and the number and
make-up of each development workgroup.
The TSP will also review progress throughout the iteration and will
assess whether the goals set at the beginning of the iteration have been
achieved
14.
Summary
I hope that the above text and the document references which follow
are enough to convince the AGOC that the
AstroGrid project does indeed have an
architecture, even if it is not yet realised in a single document. We will have
that document by the end of Iteration 01 (March 31st 2003). To delay the
release of funds to the project until after that date would mean another three
month delay in recruiting the people required, impacting the amount of work we
could undertake in Iteration 02.
As I stated above, I would like to propose that the AGOC recommend to
the GSC that the final funding for
AstroGrid be released immediately.
Bibliography
[1] AstroGrid wiki website can be found at:
http://wiki.astrogrid.org/bin/view/Astrogrid/WebHome.
[2] The original
AGOC document is on the wiki at:
http://wiki.astrogrid.org/pub/Astrogrid/OversightCommittee/AGOC2-PaperC.html,
and the Phase A Report document at:
http://wiki.astrogrid.org/bin/view/Astrogrid/RbProgressAgainstGoals.
[3] The goals for
each quarter were documented in a work package forecast and progress against
those goals in a work package report. All were linked from a wiki page at:
http://wiki.astrogrid.org/bin/view/Astrogrid/WpReports.
[4] An initial
report of this 'grid' demonstration is at:
http://wiki.astrogrid.org/pub/Astrogrid/TspMinutes01/data-centre-demo.txt.
[5] See:
http://wiki.astrogrid.org/bin/view/Astrogrid/RbScienceRequirementsSummary.
[6] These key
science drivers are listed at:
http://wiki.astrogrid.org/bin/view/Astrogrid/ScienceProblems.
[7] A pdf version
of my slides is available from the
NeSC site at:
http://umbriel.dcs.gla.ac.uk/NeSC/general/talks/105/session2_2.pdf.
[8] The meeting
results are documented at:
http://wiki.astrogrid.org/bin/view/Astrogrid/FocusVOUsage20021121.
[9] The
International Virtual Observatory Alliance (IVOA) has its website at:
http://www.ivoa.net/, and we have
some wiki-based documents at:
http://wiki.astrogrid.org/bin/view/IVOA/WebHome.
[10] The Globus
definition of Community Authorisation Server (CAS) is at:
http://www.globus.org/security/CAS/,
with a number of
AstroGrid responses on the wiki, the latest being:
http://wiki.astrogrid.org/bin/view/Astrogrid/CASDemo.
[11] For a couple
of examples of how a user can arrange the layout of a portal, refer to: My Yahoo:
http://uk.my.yahoo.com/; NewsIsFree:
http://www.newsisfree.com/.
[12] An overview
of Microsoft Passport is at:
http://www.microsoft.com/netservices/passport/overview.asp.
[13] The
Technical Support Panel (TSP) is described at:
http://wiki.astrogrid.org/bin/view/Astrogrid/TechnicalSupportPanel.
[14] The first
meeting of the TSP, to initiate Iteration 01, was held in Leicester on 6th
January 2003; the minutes can be viewed at:
http://wiki.astrogrid.org/bin/view/Astrogrid/TspMinutes01.
[15] Microsoft
Visio is a component-based diagramming tool which has been used as the
front-end for workflow construction in a number of software suites; see:
http://www.microsoft.com/office/visio/
[16] My
experience was with the architecture of a commercial portal and e-commerce
platform in which message queuing was vital. Among other products investigated
was IBM MQ Series, now called WebSphere MQ; see:
http://www-3.ibm.com/software/ts/mqseries/.
--
TonyLinde - 21 Jan 2003