r22 - 29 Dec 2002 - 17:27:00 - TonyLindeYou are here: TWiki >  Astrogrid Web  >  DocStore > PhaseBDocs > RbArchitectureOverview

PhaseAReport

(3) Architecture Overview

(3.1) Introduction

This document will present a brief overview of the AstroGrid Architecture. A system Architecture is a high level description, in formal language (we use the Unified Modelling Language - UML (1)), of the system to be built. It focusses on the key decisions. These are decisions about structure, components, technology etc which are key to the success of the project. From the architecture will be derived the design documents for all components of the Virtual Observatory (VO) as well as the plans and milestones for the build phase of the project.

The Architecture consists of a number of documents stored on the AstroGrid wiki (2), many linked from the ArchitectureDocs (3) page (most of the references below will be hyperlinks to wiki pages). Given the interactive nature of the wiki, it will be obvious that the Architecture is no static, monolithic document. It reflects the constantly evolving nature of the project.

The architecture-related documents include discussion papers, technology assessments, reports etc. From these, we have developed more formal documents, including use cases, technology choices, analysis and design models, etc.

Although the Architecture is not complete, enough is now known about the requirements for a VO, and sufficient high level design has been carried out to enable estimates for funding and personnel to be generated. Detailed estimates are provided in another part of this RedBook. Here we show milestones in the development of AstroGrid components over the two years, 2003/4, to enable project progress to be tracked.

The rest of this document consists of:

  • Approach: an outline of the approach taken to develop the architecture
  • Use Cases: our approach is driven by use cases which specify the system-oriented content of the AstroGrid VO
  • Conceptual Model: a whole-system model which enabled development of science-based use cases
  • Services Model: a component-based model of the VO
  • Technology Demonstrations: an overview of several subprojects which tested aspects of new technology
  • Technology Choices: technologies which have been chosen for the project

(3.2) Approach

Early in the project, it was decided (following a recommendation by the Project Manager) that we should follow the Unified Process methodology (4). In fact, AstroGrid has contributed its own variant of the Unified Process (UPeSc (5)).

The UP is both iterative and incremental. From general requirements, use cases are documented. This allows the creation of an outline model of the architecture: including subsystems, classes and components. The use cases are incorporated into the model, documented as Sequence Diagrams, (7) and this leads to changes in the architecture as well as the use cases, and so the model changes and evolves.

Within the AstroGrid project, the Project Scientist, together with other members of the project and outside astronomers, contributed a large number of Science Problems (6), which it was considered would be the types of science which a VO will enable. These are covered in more detail in another part of the RedBook. This was our main addition to the Unified Process. Ten of these science problems were chosen as those which the AstroGrid VO should enable astronomers to tackle (8).

In order to visualise how this science might be conducted, a Conceptual Model (see below) of a Virtual Observatory was developed. This model documented the concepts important to the VO domain (topic or subject area) and how they were related. More importantly, it allowed the creation of sequence diagrams, for key science problems. This was critical to understanding how a VO should operate. Eventually, enough was understood about the type of VO that AstroGrid would build, and the team moved to construct a more realistic model.

We were considering the idea of adopting a web service-based approach when the Globus team announced plans for version 3 of its toolkit, Globus OGSA (9), in which the grid would be enabled using web-service based components (hence grid services). We took the decision to support this move and committed the project to using this new approach. We will, however, err on the side of caution and will ensure that the services within the AstroGrid system can work independently of OGSA or with some other commercial grid implementation if such is developed before the project completes.

The next generation of model for the AstroGrid VO therefore defined components as services and component invocation as messages between those components: the Services Model (see below).

(3.3) Use Cases

The Unified Process describes itself in three key phrases (19) as:
  • Use Case Driven
  • Architecture-Centric
  • Iterative and Incremental

The whole of this document deals with the architecture and another document in the Phase A Reports, Phase B Plan, will deal with how we will implement the iterative feature of the UP. In this section we will describe our approach to Use Cases.

A use case is a scenario in which a user (or an agent or other piece of software) initiates an action which leads to some benefit to the user: eg a result is returned or the software is put into a state which will enable another action to take place. In parallel with the definition of science problems, we also defined use cases which would resolve the science problems. In general, a science problem involved the execution of several use cases, and most use cases were applicable to several science problems (20).

As an example:

  • AuthenticateIdentity:
    in which a gatekeeper checks the identity of a user (or agent) certificate and verifies whether it is trusted;
  • MySpaceStoreResults:
    in which an astronomer is presented with the option of storing a dataset resulting from a query in the MySpace area;
  • UploadUserCode:
    in which a user uploads their own code to a computer on the grid so as to run a specific analysis on data held there.

Over the next three months, and into 2003 for the later components of the architecture, we will continue to write use cases and realise them as sequence diagrams.

As well as assisting in defining the requirements of the system, use cases have two other important functions:

  • iteration funcitonality:
    Before beginning a period of building software, a number of use cases are chosen as those which will be realised during that period. Component software will then be developed or enhanced so that the use cases can be undertaken.
  • test cases:
    Each use case is also a test case. After building software, the team, and users, will check the software to ensure that the use cases are correctly realised.

(3.4) Conceptual Model

The Conceptual Model (10) is a whole-system model, ie it looks at the VO as if it were a single system. This is useful in the early stages of a project to allow analysts to model the dynamic behaviour of a system without worrying about the separation of objects into components. It is also referred to as a Domain Model. (Note: one unexpected benefit of the conceptual model was that the concepts listed provided a good starting point for the Ontology Demonstration - see later.)

The domain model is too large to show here. A reduced picture will show the size and scope of this model (if you are viewing this document online, click the picture to view the full-size model):

Modelling the system concepts as classes allowed us to model the dynamic behaviour of the science problems in sequence diagrams (11). This proved a key endeavour as it allowed both the development of system use cases and the further elaboration of the domain model. One example of a sequence diagram is shown here:

browndwarfSD2.gif

The conceptual model served as the basis for developing the services model. The concepts - those that were key to the architecture - were partitioned into components (12) which would be delivered as web/grid services.

(3.5) Services Model

Key to any modelling enterprise is the creation of models which look at the system from different viewpoints. The Services Model (13) starts from a component view of the system and then looks at the interactions required between those components in order to deliver the required functionality.

The next step of this modelling workflow is to take the sequence diagrams developed under the conceptual model and re-engineer them using the component services. This will determine the properties and methods that each service needs to implement. Detailed design for each component will not be done until the build phase of the project.

The creation of the services model has only just begun (and is expected to be complete by end of December 2002). The services in the model to date include:

  • Activity Log
  • Analysis Tools
  • Application Resource
  • AQL Translator
  • Cas Server
  • Compute Resource
  • Data Mining
  • Database Export
  • Dataset Access
  • Data Router
  • Job Control
  • Job Estimator
  • Job Scheduler
  • MySpace
  • Query Estimator/Optimizer
  • Replica Builder
  • Resource Registry
  • User Notification
  • User Preferences
  • Workflow

In addition to these services, it is envisaged that a web-based Portal and a PC-based Client program will be developed to enable the user to discover resources and construct jobs to run on the Virtual Observatory as well as a web-based Log Analyzer to provide resource analyses. For a rather simplistic view of the linkages between these services, the following is a good overview:

As an indication of the greater detail required in the services model, the following is based on the sequence diagram above but using services and only (so far) modelling the first two flows in that diagram:

Some of the core services include:

(3.5.1) Compute Resource

This is an abstract service (ie, one which does not actually exist but serves as a template for other services) which provides a standard set of properties and methods implemented by services which provide access to computing resources. This will enable another service to discover what facilities are available on the computer and how they can be accessed. Methods will be made available for user-written code to be uploaded and executed to make use of the facilities.

(3.5.2) Application Resource

This is also an abstract service. Implementations of this service will provide access to applications pre-installed at service sites. As an example, the AstroGrid team is currently working on wrapping the SExtractor tool in a web service; in the future, this web service would be expected to implement the Application Resource interface. Other likely examples are interfaces to IDL installations, visualisation tools etc.

(3.5.3) Resource Registry

This is the heart of the VO. The registry itself will contain a description and link to every resource in the VO. For AstroGrid, the decision was taken to implement a fine-grained registry, so that as many queries as possible about resources available can be answered from the registry rather than having to forward those queries to the resources themselves. The registry service will provide a set of properties and methods which will enable the discovery of any resource in multiple ways. Metadata for each resource will be drawn from an astronomical ontology (see below for description of on-going technology investigation) allowing a linked inference engine to discover resources relevant to a user's enquiry.

(3.5.4) CAS Server

CAS (14) (Community Authorization Service) is a development of Globus to provide the ability to specify and certify the groups which a person may belong to on the grid. Each community will have the ability to define groups and members and assign rights to each. Data centres will then authorise access to their resources by communities, groups or individuals. AstroGrid will develop its own implementation of a CAS server as well as a user interface via client or portal software.

(3.5.5) Workflow

This service will be mainly used by the user interface programs, both the web-based and PC-based ones. It enables a user to create a programme of work, create jobs within that programme, add tasks to a job and then to submit the job and monitor its progress. The user may also create a job 'template', so that certain tasks can be rerun many times (possibly varying one or two parameters).

(3.5.6) Job Control

This is that part of the workflow service which controls the submission and monitoring of a job. It will also detect the completion of one task within a job and submit the next in the workflow.

(3.5.7) User Notification

This service will provide access to a range of notification services. A user can elect to be notified about the progress of a job: when certain tasks finish, when final or intermediate results are available etc. They can also specify how to be notified. This will typically be via email but might also be by logging into a notification web page, or even by SMS text message to a mobile phone.

(3.5.8) AQL Translator

This is less of a 'core' service but was worth including here because of the concept of AQL, Astronomical Query Language. It has long been recognised in astronomy that the standard SQL language used to query databases is inadequate for many astronomical tasks. It is hoped that we might begin development of an AQL as part of this project and in conjunction with partner VO projects. What form it might take or how it might be put to use is still very much open. The AQL Translator service will parse an AQL query to determine the catalogs or data sets which need to be queried and create the relevant SQL queries.

(3.5.9) Dataset Access

This is an abstract service which will exist in front of any accessible dataset. This service is critical to the success of AstroGrid: many of these datasets are (or will soon be) overwhelmingly large and AstroGrid will provide the astronomer with the means of selecting a subset of data on which analysis can be performed. A dataset might be a catalog, a collection of FITS files or any other set of astronomical data. The service will provide details of how a subset of data is to be selected and retrieved (eg variant of SQL to be used, dbms type etc.) and will process queries to the dataset and return results (or pointers to where the results might be located if they are too large to move).

(3.5.10) Data Mining

This service will provide an interface to intensive data mining tools. We will, in order to address the key science problems, require more than simple data selection tools. Some methods of data selection will require, for example, intensive statistical analyses to be performed as the selection is taking place, all the time altering or tuning the selection criteria. We will develop some of these mining tools to enable the AstroGrid VO to prove the concept but, more importantly, we will create the interfaces and standards which will allow others to create similar tools.

(3.5.11) Data Router

This service will provide data movement facilities. Query results could be moved from the dataset scratch space to a user's permanent storage; data could be moved to a more powerful machine or to more complex software for detailed analysis etc. This service will need to be 'aware' of its environment, so that a movement from one part of the same machine to another is done with a simple copy while movement between separate sites is done by the most efficient method possible: eg GridFTP if both sites support the protocol, FTP if not.

(3.5.12) Server-based Analysis Tools

As with the data mining tools, we will require some analysis tools to be developed as a proof of concept. The key issue is that these tools must operate in a service-based, server environment. We will also develop the interfaces and standards that will allow other such tools to be developed.

(3.5.11) MySpace

This service is perhaps the most interesting new concept developed by the AstroGrid project. The concept allows the user to 'own' space on the VO. This space could be distributed across many computers and disks, all transparent to the user. For example, a user might run a query on a dataset and the results stored on that machine (providing security constraints are satisfied), then transferred to another machine when required as the model for another application, yet when the user looks at the MySpace directory they see only one object in their MySpace. A user will also be able to give access to any object in their MySpace to others and might make an object publicly readable, for instance when publishing the results in a paper. The service will provide methods for reserving space, adding objects to the MySpace, listing a user or group's objects etc.

(3.6) Technology Demonstrations

AstroGrid has set up a number of technology trials. These are designed to test new ideas, check the feasibility of new technologies or simply get a head start on probable components of the VO. The Pilot Projects (described in another part of the Phase A Report) were the most significant demonstrations. Smaller trials were also set up, and are still underway (15). These are:

  • CAS Server
    The goal was to produce a working CAS server (see above for explanation of CAS), enabling the creation of a community with groups of members having different access rights to a number of resources.
  • Ontology trials
    The goals are to produce: a first draft, skeletal, AstroOntology, incorporating UCD and VizieR information; a registry of (a few) UK-based astronomical catalogues, each described in ontology-based terms; an ontology-based registry access method; and an ontology-based workflow, driving a registry access web site.
  • DBTF Technologies
    The goal was to assess OGSA-DAI (16) technologies for access to XML and relational databases.
  • AVO Science Demo
    This is the AstroGrid contribution to the AVO science demonstration scheduled for January 2003. At this stage, our effort will concentrate on producing a web service which wraps the SExtractor tool and provides methods to be used by a modified version of the Aladin service accessing GOODS data.
  • Data Centre trials
    The goal is to demonstrate the issues involved in providing a web service front-end to an astronomical data centre and its archives. At this stage, it will focus on the Cambridge and Leicester data centres.
  • Working Grid
    The goals are to establish a working grid with: at least one machine at each of five sites running Globus 2 and able to GridFTP between each site; and at least one web service deployed at each site.

(3.7) Technology Choices

A number of technology choices have been made within the project (17). The first decision was to forego the existing Globus grid technologies and embrace the (as it was then) new concept of grid services within the Globus OGSA effort. We felt that web services offered significant benefits to future VOs: it was compliant with the direction of the W3C and industry; components could be packaged as discrete entities, running on servers without the problems of library conflicts that come with client-based programs; replacement components could be developed, deployed and slotted into astronomers' workflows with minimal effort.

Next choice was the development and deployment platform. The obvious choices were between the .Net platform and the Java platform. Although .Net offers technical advantages over Java, it is currently only available on Windows machines and is relatively new. For those reasons, plus the greater availability of Java developers, we decided to adopt the Java platform but expect to be able to make use of .Net deployed web services within our workflow.

The most significant technology choice still outstanding is that of database platform. We have evaluated several open source and commercial databases (18):

  • MySQL
  • PostgreSQL
  • Oracle
  • Microsoft SQLServer
  • IBM DB2 (still being investigated)

No choice has yet been made but we are likely to select one or more to cover the following broad requirements:

  • small internal tables
  • MySpace
  • data warehouse and data mining

Whatever our choices for the project, it is our firm intention that all data access will be via industry standard libraries (eg Java JDO) so that different databases can be used by those who choose to implement the AstroGrid VO.

(3.7.1) Open Source development

We are committed to developing the AstroGrid components in an Open Source way. This means that the source code will be freely available for anyone to download and make use of in any way they choose. We have not yet selected a license but will probably choose one from the LGPL, Apache, Berkeley style of licenses (21), which allows any use of the source code whether in other open source products or in commercial products.

Whether we also allow other people to participate in the project development process, by contributing changes to the code, is an issue we have not yet addressed. If people outside the project do express a wish to participate in the coding of our components, we will look at their request carefully.

(3.9) References

(1) UML Explained, Kendall Scott, Addison-Wesley, 2001

(2) Wiki: this is a web-based tool which allows any registered user to modify a set of pages. The AstroGrid project uses the wiki for all document storage. An explanation can be found at: http://wiki.astrogrid.org/bin/view/Main/WebHome.

(3) http://wiki.astrogrid.org/bin/view/Astrogrid/ArchitectureDocs

(4) See the brief explanation at: http://wiki.astrogrid.org/bin/view/Escience/UnifiedProcess.

(5) See the explanation at: http://wiki.astrogrid.org/bin/view/Escience/UPeSc.

(6) See the full list at: http://wiki.astrogrid.org/bin/view/VO/ScienceProblemList.

(7) A Sequence Diagram is a UML tool which shows a sequence of object interactions in time-ordered manner. In this context, it allowed the team to visualise how a VO would work as an astronomer used it on specific science problems.

(8) These ten key science problems are documented at: http://wiki.astrogrid.org/bin/view/Astrogrid/ScienceProblems.

(9) OGSA (Open Grid Services Architecture): effectively Globus Toolkit v3, this is is a proposed evolution of the current Globus Toolkit towards a Grid system architecture based on an integration of Grid and Web services concepts and technologies. See http://www.globus.org/ogsa/

(10) Conceptual Model: downloadable in document form as AGProjectReport_d2.doc from http://wiki.astrogrid.org/bin/view/Astrogrid/ArchitectureDocs, or as zipped Together directory as astrogrid20020809.zip.

(11) See: http://wiki.astrogrid.org/bin/view/Astrogrid/SequenceDiagrams

(12) This took place over a two day meeting in Leicester: the outcome of the meeting is documented at http://wiki.astrogrid.org/bin/view/Astrogrid/ArchitectureMeeting20020819 and the list of services at http://wiki.astrogrid.org/bin/view/Astrogrid/GridServiceList

(13) Services Model: downloadable as a zipped Together directory, AGServices.zip, from http://wiki.astrogrid.org/bin/view/Astrogrid/ArchitectureDocs.

(14) Community Authorization Service (CAS): see Globus page at: http://www.globus.org/Security/CAS/

(15) See http://wiki.astrogrid.org/bin/view/Astrogrid/DemoProjects.

(16) OGSA-DAI (http://umbriel.dcs.gla.ac.uk/NeSC/general/projects/OGSA_DAI/) is a subproject of the Globus OGSA (see above) effort. Initially started by the DBTF (Database Task Force, one of the UK e-Science teams: http://umbriel.dcs.gla.ac.uk/NeSC/general/teams/), it later became a working group of the GGF (Global Grid Forum: http://www.globalgridforum.org/6_DATA/dais.htm).

(17) See http://wiki.astrogrid.org/bin/view/Astrogrid/TechnologyDocs.

(18) See http://wiki.astrogrid.org/bin/view/Astrogrid/DbmsEvaluations.

(19) The key reference manual for the Unified Process is: The Unified Software Development Process, Ivar Jacobson, Grady Booch, James Rumbaugh, Addison-Wesley, 1999
For the three key-phrases and an explanation of them, see p4 onwards.

(20) The AstroGrid wiki lists two sets of use cases. In the VO web, http://wiki.astrogrid.org/bin/view/VO/UseCaseList, the use cases refer to any potential VO; in the AstroGrid web, http://wiki.astrogrid.org/bin/view/Astrogrid/UseCases, they refer to the resolution of the AstroGrid key science problems.

(21) The Open Source Initiative, http://www.opensource.org/, maintains a reference of approved open source licenses at http://www.opensource.org/licenses/index.php.

-- TonyLinde - 13 Sep 2002

Edit | Attach | Printable | Raw View | Backlinks: Web, All Webs | History: r22 < r21 < r20 < r19 < r18 | More topic actions
 
AstroGrid Service Click here for the
AstroGrid Service Web
This is the AstroGrid
Development Wiki
This site is powered by the TWiki collaboration platformCopyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback