r3 - 20 May 2003 - 15:42:48 - TonyLindeYou are here: TWiki >  Astrogrid Web  >  DocStore > OversightPage > OversightCommittee > ArchitectureThesis

AstroGrid Architecture: Thesis

submission to AGOC meeting, 30-Jan-2003

Tony Linde
AstroGrid Project Manager
University of Leicester

Preamble to AGOC

AstroGrid does not yet have a documented architecture. It was supposed to have been completed by the end of 2002 but other activities and lack of resources meant that this was not done. The project has added a task to its Iteration 01 (Jan 01 - Mar 31, 2003) plan to complete and document the architecture and this task has been assigned to the project's Technical Lead, Keith Noddle.

In this document I will draw together a number of existing documents from the AstroGrid wiki website 1. These will demonstrate that, far from having no architecture, the project has a well developed and consistent idea of an architecture, both in approach and content. I do not contend that this should stand as a replacement for the architecture; we still need the architecture and are committed, now that resources are available, to getting it done.

I would propose, however, that this thesis demonstrates sufficient consistency of vision and our track record demonstrates sufficient ability and control over the project that the AGOC should recommend that the GSC release the full complement of funding to the project so that we can proceed with recruitment and systems development.

Deliverables to date

We have dealt with AstroGrid's concrete deliverables in much detail over the past year, in particular the 'Progress Against Goals' document presented to the last AGOC meeting and its expanded version inserted into the Phase A Report 2. In addition, as part of the project control process we have implemented, specific short-term goals were set and reported on for each work package during Phase A 3.

Changes

Since producing the Phase A Report, 'Progress Against Goals', a few changes need be noted:

The one item of the deliverables listed that has not been delivered is the Architecture document (sections 11.2.3 and 11.4.1).

In section 11.2.6, it was stated that we had failed to provide a working grid. By the end of Q5, however, Guy Rixon and Patricio Ortiz had demonstrated a web service based grid working between Leicester and Cambridge, accessing data on a Cambridge server 4.

Although, in section 11.3.7, we committed to investigating and possibly using OGSA-DAI technology, there were significant delays in its development, during which our enthusiasm waned. We have, however, once again accepted that we have a role to play in the UK e-Science programme as well as in UK Astronomy, and have assigned two people to work on developing the necessary interfaces with OGSA and OGSA-DAI projects and technology so that they can be incorporated into AstroGrid as soon as feasible.

Science

Although AstroGrid is primarily a technical project, its goal is to make science easier and more effective for astronomers. We have documented our approach to the science in the Phase A Report, 'Science Analysis Summary' 5. The use of science as a driver for formulating the system architecture proved highly effective; the 'AstroGrid Ten' key science drivers 6 provide a constant check for us at every stage of the project that we are producing a science enabling product.

NOTE: I presented this 'scientific' extension to the Unified Process to a forum of e-Science project managers at NeSC in December 2002 where it was well received.

Component model

AstroGrid has been envisaged as a component-based product from the very earliest days, and, to a large extent, the identified component blocks have not changed much, though we have expanded the definition of individual components much more. (One of the reasons we need a documented architecture is that these details are spread throughout the project wiki.)

The latest incarnation of the component model is:

Although this looks a simplistic and relatively meaningless diagram, it is a measure of the depth to which project members understand the architecture that we were able to document how each of these components would operate and how they would interact in a two-day meeting 8.

Phase B milestones

The component model is well enough understood to enable us, with a fair degree of confidence, to provide the AGOC with a list of milestones (aligned to Iteration end dates) and deliverables. At the end of each iteration, the AGOC can measure progress by determining whether each aspect of a component has been delivered as specified.

VO vision

Our approach to building a VO is predicated upon our vision of what a VO should be:

  • component based

    This is not simply a reference to the fact that a VO might be built of components itself but that a user should be able to select components from several VO developments in order to achieve their scientific goal.

  • not a single system

    The VO should not be one or a few interoperating systems. It ought to be possible for anyone to create software which links to VO resources and therefore 'looks like' a VO.

  • open to all

    Anyone should be able to make use of the VO and anyone should be able to contribute resources - whether data or services - to the VO.

In a way, we see the VO as equivalent to the web. There is no single system or organisation which owns it. It arises from many resources which implement a few simple standards.

In the following sections, I will very briefly describe each of the components that AstroGrid intends to add to the VO and allude to the iteration deliverables specific to each component.

Portal

The portal will be a server-based web-delivered component through which all VO user interfaces are delivered. Each component which requires user interaction will do so through a portlet. The portal will allow the user to select and arrange portlets on one or more web pages 11. A portlet will conform to certain xml-based protocols so that software can interact with the user through the portal. Thus, any developer can add their own product's functionality into a portal simply by wrapping it in a portlet.

In the first iteration, we will develop a simple version of the portal with general portlets for logging in and providing news and a specific one for submitting a query to the registry. The portal will be further developed in later iterations, mainly through the provision of additional portlets. Finally, a facility will be provided to enable other developers to add their own software to the portal.

Community

This will allow a group the ability to construct an online community with individuals and groups. A resource centre can then assign permission to use its resources (data sets, services etc.) to one or more groups within a community instead of having to name the individuals themselves. Within a community, the administrator of the community can assign rights to individuals and groups, including the right to add members and create groups. AstroGrid will base its community design on the Globus CAS product and standards 10.

The management aspects of a community will be provided in a portlet built in Iteration 02. At a later stage we will add the ability to exchange information between two communities (by which time we hope to have some agreement within IVOA on how to do this). Other aspects of user authorisation will be handled under other components.

AstroPass

The AstroPass is our name for a central server which will store user credentials, in much the same way as the Microsoft Passport scheme 12. Initially, we will simply allow a user to identify themself with a username/password combination but later will accept the upload of user certificates. A user can determine how much or how little information is exchanged with other VO portals.

Workflow

This component will provide the core functionality of the VO, enabling the user to construct complex workflows, adding queries and data analyses, uploading and downloading data and algorithms, rendering the results as tables or images etc. We will provide a Visio-style interface 15 for the user to drag and drop components to a workflow, with the workflow engine being intelligent enough to determine when the output of one component does not match the input of another and to suggest a translator. Functional flows may be forked and joined under specific conditions. At points, the user can insert breakpoints where the flow will be interupted to allow the user to undertake inspection of the intermediate data or carry outsome manual process. As well as constructing complex workflows, the user may also simply enter single queries to registries or datasets.

In the first iteration, the user will be able to enter a simple interactive query against the registry (probably no more than '_return all datasets which contain X-Ray data_'). Later iterations will add the ability to construct a workflow, initially with only simple dataset queries and data movement but later allowing the submission of more complex jobs.

Job Control

This component allows the user to inspect the status of a job which has been submitted and, if desired, change the run paramenters of that job. Initially we will develop a simple job monitoring portlet but willl add the ability to interupt and change jobs later.

Registry

In AstroGrid, the registry will be fine-grained: so will contain detailed information about resources, allowing the user to locate a specific resource without queries having to be sent to the resource centres themselves. In the first stage, we will store simple dataset information (wavelength, number of entries, data table schema) and allow simple queries. This will expand as we add more and more data and types of data and features to manage and query the registry.

Our original intention was to research the possibility of creating an ontology-based registry, so one that could understand the context of query and process it accordingly. However, time and resource constraints have led to this being dropped from the project.

In the first iteration, we will define a schema for the registry and a simple query protocol. This will be done in parallel with the development of IVOA standards and protocols through a working group which I have volunteered to lead.

MySpace

MySpace will provide users and components with a virtual space in which data can be temporarily and permanently stored. The end goal is that the user need not know where data are stored but will be able to view a list of all of them, organised in some folder structure. Ultimately, we expect that this will evolve into something like the EBI's repository for published data, so a user might publish an article in a journal along with a URL to where the data might be located. That URL would be location independent so that the data could be found long after the original server hosting it has passed to silicon heaven.

Initially, we will provide MySpace as a single and specifiable location but will evolve this into a multi-site, publishable capability. In a later iteration, we will provide a MySpace Explorer, with which all a user's data can be listed no matter where it is stored nor what format it is stored in.

Data Centre

This component will present a standard interface to all access to a resource centre (which would be a better name for it). Ultimately a piece of software may get a handle to access a dataset or service directly but data centre may choose to channel access to all its resources through such an interface. In the AstroGrid VO, this will be the case as we expect most data centres will initially want this sort of monitoring and control.

Initially, this component will simply allow access to a dataset. Later additions will allow for data to be routed elsewhere and for data policies to be implemented (checking against community user or group permissions). Fnally, some warehousing and intensive data mining facilities will be provided.

Dataset Access

This component will simply take a standard query from the data centre, translate it into the form appropriate for the dataset and execute it, returning the results to the data centre.

This simple functionality will be provided in the early iterations. At a much later stage, the ability to create a warehouse for storing query results will be added and the ability to run the user's own code to extract results.

Visualisation

Owing to time/resource constraints, we will not be undertaking research into server-based, interactive visualisation as originally intended. We do intend to provide the option for sending data to a tool which generates an image from a set of data and loads it into the appropriate part of a web page. We will also provide links with one or two desktop visualisation tools and will publish documents which allow any other tool provider to adapt their tool similarly.

Astronomical Tools

We will wrap a number of existing astronomical tools, those considered most essential to creating workflows which fulfill our key science goals, so that they may be added into a workflow or executed directly from the portal.

AstroMQ

This is still a speculative component. It was felt by some (mainly me) that some form of asynchronous message queue facility would benefit communication between components. This was mainly based on previous workflow experience. Whether this component survives into later iterations is still debatable. If so, we will likely choose simply to implement one of the existing open source messaging tools.

Development approach

I'm sure the AGOC members are sick of hearing about the iterative and incremental approach of the Unified Process. What I would like to highlight here, though, is the incredibly dynamic nature of the field in which we are working.

In the Virtual Observatory sphere, AstroGrid is one of the founder members of the IVOA and the only VO project in the world which has started development 9. At the moment, only one standard has been defined for VO interoperability: VOTable. There are likely to be many more and AstroGrid must implement them if it is to provide a working product for UK astronomers. AstroGrid members are very active in the forums in which these standards will be determined so we hope to minimise any impact on the project.

In the Grid world, Globus, Microsoft and IBM are defining what a grid is and how it will operate. About the only sure thing at this stage is that it will be web service based. We have 'translated' a number of grid standards (eg, the CAS idea for authorisation 10) and are tracking the rest to ensure that our architecture and designs are not too out of step.

In such an environment, the only sane approach is to adopt a Just-In-Time philosophy. Decisions on technology, standards etc will be delayed until we need to make them. This is perfectly in line with the incremental approach. So, although we will document our understanding of the architecture, it will be detailed where we define use cases, more high level for the sequence diagrams which document sets of use cases, and will only in rare instances define the object model for an area. Design at the level of object model and collaboration diagrams will only be undertaken during the iteration in which those use cases are realised.

Organisation

The development of code in each iteration will be carried out by one or more workgroups, each led by a workgroup leader, all under the control of the Technical Lead, Keith Noddle. The goals for each iteration are reviewed by the TSP 13, a group of long-term members of the project, one from each institute (plus the Project Manager and Project Scientist), combining technical and scientific skills. It is this group which determines the use cases to be realised each iteration and the number and make-up of each development workgroup.

The TSP will also review progress throughout the iteration and will assess whether the goals set at the beginning of the iteration have been achieved 14.

Summary

I hope that the above text and the document references which follow are enough to convince the AGOC that the AstroGrid project does indeed have an architecture, even if it is not yet realised in a single document. We will have that document by the end of Iteration 01 (March 31st 2003). To delay the release of funds to the project until after that date would mean another three month delay in recruiting the people required, impacting the amount of work we could undertake in Iteration 02.

As I stated above, I would like to propose that the AGOC recommend to the GSC that the final funding for AstroGrid be released immediately.

Bibliography

[1] AstroGrid wiki website can be found at: http://wiki.astrogrid.org/bin/view/Astrogrid/WebHome.

[2] The original AGOC document is on the wiki at: http://wiki.astrogrid.org/pub/Astrogrid/OversightCommittee/AGOC2-PaperC.html, and the Phase A Report document at: http://wiki.astrogrid.org/bin/view/Astrogrid/RbProgressAgainstGoals.

[3] The goals for each quarter were documented in a work package forecast and progress against those goals in a work package report. All were linked from a wiki page at: http://wiki.astrogrid.org/bin/view/Astrogrid/WpReports.

[4] An initial report of this 'grid' demonstration is at: http://wiki.astrogrid.org/pub/Astrogrid/TspMinutes01/data-centre-demo.txt.

[5] See: http://wiki.astrogrid.org/bin/view/Astrogrid/RbScienceRequirementsSummary.

[6] These key science drivers are listed at: http://wiki.astrogrid.org/bin/view/Astrogrid/ScienceProblems.

[7] A pdf version of my slides is available from the NeSC site at: http://umbriel.dcs.gla.ac.uk/NeSC/general/talks/105/session2_2.pdf.

[8] The meeting results are documented at: http://wiki.astrogrid.org/bin/view/Astrogrid/FocusVOUsage20021121.

[9] The International Virtual Observatory Alliance (IVOA) has its website at: http://www.ivoa.net/, and we have some wiki-based documents at: http://wiki.astrogrid.org/bin/view/IVOA/WebHome.

[10] The Globus definition of Community Authorisation Server (CAS) is at: http://www.globus.org/security/CAS/, with a number of AstroGrid responses on the wiki, the latest being: http://wiki.astrogrid.org/bin/view/Astrogrid/CASDemo.

[11] For a couple of examples of how a user can arrange the layout of a portal, refer to: My Yahoo: http://uk.my.yahoo.com/; NewsIsFree: http://www.newsisfree.com/.

[12] An overview of Microsoft Passport is at: http://www.microsoft.com/netservices/passport/overview.asp.

[13] The Technical Support Panel (TSP) is described at: http://wiki.astrogrid.org/bin/view/Astrogrid/TechnicalSupportPanel.

[14] The first meeting of the TSP, to initiate Iteration 01, was held in Leicester on 6th January 2003; the minutes can be viewed at: http://wiki.astrogrid.org/bin/view/Astrogrid/TspMinutes01.

[15] Microsoft Visio is a component-based diagramming tool which has been used as the front-end for workflow construction in a number of software suites; see: http://www.microsoft.com/office/visio/

[16] My experience was with the architecture of a commercial portal and e-commerce platform in which message queuing was vital. Among other products investigated was IBM MQ Series, now called WebSphere MQ; see: http://www-3.ibm.com/software/ts/mqseries/.

-- TonyLinde - 21 Jan 2003

Edit | Attach | Printable | Raw View | Backlinks: Web, All Webs | History: r3 < r2 < r1 | More topic actions
Astrogrid.ArchitectureThesis moved from Main.TonyArchitectureThesis on 07 Feb 2003 - 11:33 by TonyLinde - put it back
 
AstroGrid Service Click here for the
AstroGrid Service Web
This is the AstroGrid
Development Wiki
This site is powered by the TWiki collaboration platformCopyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback