Results of Focus meeting on VO Usage held in Leicester on 21-22 Nov 2002:
(Day 2), AnitaRichards
(Day 2), PatricioOrtiz
(Day 1), TonyLinde
in the discussion section below, no attempt is made to identify who said what - everyone partook of all the discussions.
If any participant feels I've got the discussions wrong - please make the necessary corrections. Comments on the content can be made at the bottom or on the forum page
- Job Control
- Data Centre & Permissioning
- Astronomical Tools
This two-day meeting was held to gain some agreement over how the AstroGrid
VO would work when in use. The way we tackled this was to concentrate on the user interface of the VO: what screens would astronomers and administrators see; what would they be able to do and not to do; what type of information needed to be accessible and how? This proved to be a successful approach. We covered a significant amount of ground and resolved many differences of opinion about how things would work. These notes and the accompanying photographs of the whiteboard graphics will act as a starting point for future component development. I'd like to thank everyone who was able to attend the meetings for their commitment and good humour over two hard-working days.
-- TonyLinde - 25 Nov 2002
- Content for main screens
- User-oriented workflow
- System use cases
- Component design constraints
- Non-Goal s (if raised they'll be flagged as ISSUES and dealt with separately)
- Detailed design
- UI Layout
- Technical decisions
- All software is server-based & web accessed (HTML pref, possibly applets)
- No AstroGrid continuity is assumed:
- No Central organisation (so allow devolved management of VOs)
- No Central site (so allow for multiple installations)
- UK-VO Portal links through to Data Centres
- AstroGrid deployment by Data Centres
- Part or Whole of AG software
- Binding to local or remote components (ISSUE flagged - see below)
Portal: Left side
|| Right side
Job Control: Left side
|| Right side
MySpace: Left side
|| Right side
CAS/Registry/Data Centre Permissioning: Left side
|| Right side
CAS UI/Resource Management UI: Left side
|| Right side
Astronomical Tools UI
- Login: long debate over types of user and whether to allow non-certified users
- went with user types of:
- Guest: unidentified
- Registered: has registered with a portal and so name, email etc is known and has set preferences
- Authorised: has registered and provided certificate to authorise future work
- Debate on registration about whether to allow user to enter own password or system sends by email: chose to allow user to enter own
- A registered user can choose to upgrade to 'authorised' later by uploading certificate
- But what happens to the certificate? Is it:
- held by the portal (makes life easy but is less secure)
- uploaded each day (relatively secure but must be done from own machine)
- held by a single-signon (.Net Passport type) site (ease of use but less secure)
- Structure of the portal:
- allow user-defined layout
- portlets: panel included in portal; user chooses which portlets to display; portlet can be expanded or rolled up on screen (so only title shows)
- need easy way to add new portlets: standard interface definition so third parties can supply content/tools
- User profile
- include name, institute, address, email, mobile nr, notification method (email, SMS, ...),
- need to include Ts&Cs and data protection signoff
- allow language selection: initially English only
- portal must be built to allow easy translation
- Portal front page contains (within portlets):
- Name is displayed or 'Please login'
- login/logout faciltiy
- list of site facilities
- list of new messages (from running jobs)
- incl: Job ID, Message, Action
- so a message must allow inclusion of a URL that user can click on to action the message (eg an interupt message can bring up the Job Control screen at the appropriate point, or load dataset which has been created into a visualisation package)
- portal news
- list of currently running jobs
- buttons for:
- new user
- access and change profile
- if Guest, register
- if Registered, get/upload certificate
- find me: facility to find details on other portal: not needed if we have AstroPass
- this aspect was considered in conjunction with the document, WorkFlowArchitecture
- but why the Programme-Job-Plan-Action structure?
- maybe merge Job and Plan?
- so job can contain other jobs, job is DAG of actions, programme is simple list of jobs
- but maybe Plan is useful to distinguish Job (which is scheduled and run) from non-specific set of Actions that can be reused
I'm wondering if I was right to have the four levels of workflow based on the last point above. I'll make the notes follow what was said at the meeting but it might be worth reinstating the four levels as:
- Programme: contains all jobs documented under some project
- Job: is the actualisation of one or more plans, with start and end times and details of actual web services chosen (whereas Plan may contain type of service required)
- Plan: a Directed Acyclic Graph (DAG) of actions
- Action: specific action such as querying the registry, sending a message, deleting file etc.
- graphical tool for constructing Job (or Plan if we stick to the four levels)
- contains toolbox of action types: query registry, query archive, copy data, notify user, ...
- adding action to a job will bring up a properties box where data specific to the action can be entered (eg keywords & constraints on a registry query)
- can draw dependency links between actions
- so job controller will not start dependent job until first one finished
- actions may have Input and Output data
- drawing a link from output data of one job to input data of another job implies dependency and that output of the one is used as input to the other
- if data formats do not match, job builder will insert a Converter action between the jobs
- can also draw Pipeline between two actions
- this assumes streaming data between two actions
- job controller will create pipeline then initiate both actions and inform each of how to find the pipeline
- a job is specified using XML (possibly BPEL4WS)
- actions can be preceded or followed by notifications
- user can recall previous job and change action details before resubmitting it
though, using the language above, it is an interactive action
- it was considered that astronomers will likely use an interactive feature more than the job builder
- it will be run immediately (or submitted immediately - running depends on the availability of dependent resources)
- can be saved and later rerun or incorporated into job/plan
- will be similar to VizieR
- keyword selection
- selection by parameters: position, wavelength, epoch, error(s)
- can specify columns to be returned
- can generate row count, object count, catalogue list, ...
- if query of archives can choose to either:
- catenate all results or
- merge on object:
- will allow user to select merge type and how it is optimised
- this was felt to be a killer feature
- Key issues:
- how to estimate query duration and cost
- need some form of action Governor:
- limits query execution time
- alerts user
- allows restart (!?)
The following was a stab at an initial list of Action types:
- query registry
- query archive
- interupt: send email or SMS
- email status and progress of job/action
- run application
- move data
- join/federate data
- convert data format
- download data
- upload data / code
- listen / wait
- delete data
The following is a look at some of these types of action:
- need to specify the type of resource wanted
- set some constraint-value pairs (where value might be a range of values)
- in boolean form (so linked by ANDs, ORs etc.)
- select return fields
- could be stored as XPath/XQuery
- but this could be difficult to decode if action is recalled
- specify archive, dataset, table
- set constraint-value pairs
- constraint might be UCD, computed field, error
- select return UCDs / columns (including computed fields)
- could be stored as XPath/XQuery
- but this could be difficult to decode if action is recalled
- metadata specifies parameters (& their datatypes, validation criteria): schema
- also performance characteristics (dubious in initial stages)
- select app name and specify parameters
- typically processing results from archive query
- select join type (ie, what data to join on)
- and the actual columns in each table (drag links)
- specify how much to 'shift' data in each table (due to epoch) and how
- specify match resolution (eg match position data if within 0.1 arcsec)
- specify result columns
The Job Monitor screen would show:
- current jobs (running or ready to run), with Job ID, Name, Status with options to:
- start/resume job
- stop/pause job
- monitor job
- show job details, including:
- job parameters
- actions with
- previous jobs
- event log for:
- selected job/actions
- all owned jobs
- user root address is //CASname/username/, eg //ledas.le.ac.uk/TonyLinde/
- CASname is the name of the CAS server or the community identifier where user first registered
- username is the name which uniquely identifies the user within that community
- Single URI/URL for each entity stored
- appears to user as a single tree of folders and entities
- regardless of how many domains the entities are stored across
- each 'provider' of MySpace services must be able to retrieve the tree structure for any user
- do we need the whole tree structure for every user to be replicated worldwide? if not, what happens if home node is down when provider needs to get the info?
- entities stored could be: files, database tables, (anything else?)
- is accessible from a web(/grid) service interface
- a provider of a MySpace service is similar to a data centre
This is the screen used to manage and browse an individual's space:
- user logged in is identified by name
- UI shows two panels:
- tree structure: folders and subfolders
- entities within a folder and each one's properties
- entity properties include:
- domain of service hosting entity (expandable to URL of web service)
- community/group which 'owns' entity
- expiry date of space reserved for entity
- type of entity, eg: file|binary, database|SQL Server|table, shortcut
- can store links to other entities (shortcuts)
- options to execute:
- search: search own space for entities; search public MySpace for entity
- properties: shows detailed information about folder/entity:
- addressable URI
- actual URL (as of that time)
- community/group: like group in Linux, I guess; what does this mean?
- nearness: what does this mean? how do we measure it?
- delete, rename, copy, move ...
- show domains which this user has access to: how do we maintain this?
- domain name & service URL
- quota at this domain for: community | group | personal
- but user may belong to multiple groups (& communities?)
- quota available to this individual
- type of quota
- long term storage (may have expiry date)
- temporary storage for transitional files/tables (may have duration)
- expiry or duration of quota
- search log
- for all refs to this entity => type of provenance?
- so every entity must have a unique, forever identifier behind it
- creates copy or shortcut of existing entity (use entity's URI)
- cannot actually create new entity in this way; only via actions
CAS/Registry/Data Centre Permissioning
, the Community Authorization Service
, is a Globus concept (and product) which allows resource providers to specify course-grained access control policies in terms of communities as a whole, delegating fine-grained access control policy management to the community itself. Resource providers maintain ultimate authority over their resources but are spared day-to-day policy administration tasks (e.g. adding and deleting users, modifying user privileges).
The Resource Registry
is a general VO concept which has no standard at the moment. The registry is intended to provide pointers to all the resources available in the VO. There are differences of opinion over whether it should be coarse-grained (so, in terms of data, only provides details on data centres and queries are then resolved by those data centres) or fine-grained (so all metadata about any entity of interest is stored in the registry and early queries can be resolved without reference to the data sets themselves). For now, AstroGrid
has chosen to proceed with a fine-grained model.
A Data Centre
will deploy those AstroGrid
services which provide access to its data, whether they are flat files, database tables or esoteric data entities which can only be accessed through arcane acts of programming logic. The key is that the services will have standard interfaces so that queries are framed in a generic form (precursor to the infamous AQL) and translated by a service at the data centre into a form understandable by their data management programs. A data centre may also provide MySpace
services and application services (ie access to software which can be run against the available data).
We debated for a while the nature and usage of the Registry and how it was updated. As to content, the following lists will detail that.
For updating, it was undecided whether it was better to allow the responsibility for keeping the Registry up to date lay with the owners of an entry in the Registry or with the Registry via some autonomous agent. On the one hand, an agent would relieve the data centre of the task of ensuring that the registry knew of every change it made but, on the other hand, the data centre would have to provide some means of ensuring that the agent found out what data had changed and this might be more onerous than updating the registry directly.
In the end, it was decided that both methods would be beneficial. Agents would prove particularly useful in detecting missing resources, eg URLs which were no longer operational, and might be used for detecting incremental changes in data volumes for a specific data set (providing a standard method of getting that data was available - eg a call to the web service fronting that data set), but in the case of adding new resources to the registry, this must remain the task of the resource owner.
For any given service, the registry will contain:
- identifier, name and description
- URL of web service interface
- information about contact persons, host institute, etc.
The types of service that will be registered include:
- Data centre
- Application/tool service
Other types of information recorded include:
- Data source, including:
- link to host Data Centre
- link to known replicas
- accessibility info
- source & access metadata
- query constraints, eg flat files so minimal search capability, rdbms table so supports generic SQL
- performance constraints, eg on backup tape so avg 48 hour delay
- possibly provenance and quality metadata
- access policy document (in XML form): see below
- Person, including:
- link to host community (ies?)
- link to host MySpace provider (where tree info is stored)?
- name, address, ...
- Group, including:
- link to host community (definitley only one)
The data centre would provide a web service interface to its available products. Individual data sets may also have their own web service interface. In general, a query to one of these data sources would be received and translated from the generic query form into a call to the local software program which deals with that source.
The data centre is responsible for implementing the data access policy for each data source. This policy document is stored (in the AstroGrid
fine-grained registry model) along with the other metadata for the data source in the Registry. A standard for this policy document needs to be developed.
A data centre which chooses to make all of its data sources freely available can ignore all aspects of data permissioning.
As well as storing policy in the registry, all access to data will be filtered through some local policy checking algorithm. In the case where a data centre currently enforces such policy with locally maintained usernames on its machines, this can be continued by mapping each person's community/username identifier to the local usernames. In the longer term, this will be replaced by mapping policies directly to the community/username identifiers.
: A user may choose to make an entity in their MySpace
available to certain others. They will have the ability to do so and, in this case, may attach an access policy document with the entry in the registry.
CAS: Community, Groups, Individuals
A user of the VO will register with some community; this will become their home community - the community which provides them with the domain part of their unique identity (eg //ledas.le.ac.uk/TonyLinde). Note: this need may be redundant if we implement a single-signon AstroPass facility.
A community will be hosted on a server; one or more communities may be hosted on the same server. A person may register with more than one community. It is up to the community to decide whether or not to accept a person as a member.
Within a community, groups may be formed. A group is any collection of individuals with some common purpose. A group may include individuals from outside the group; such people do not thus become members of the community.
The whole purpose of the community/group structure is to allow data owners to assign permissions against their data sources to either groups or communities, and to delegate the management of the membership of that group or community to the group itself.
Each community or group will have its own management policies, including:
- who acts as administrator and what levels of administration they have
- who has the ability to create a group
- who can add new members to the group and whether the new members are from the community, some super-group or are external
Thus it is a matter of trust between the group and the data owner as to who has the right to access the data source. For publicly available data, htis is not an issue.
An interactive query will generally not check whether the user has the correct rights to access a specific set of data. Such queries will be passed to the data centre/source and permissions will be checked at that point. This may make such a query slower since the data source will need to check what groups the user belongs to. It may be that a data source simply rejects such unqualified queries. The user who 'knows' what rights are required to get at a certain data source may choose to attach authorisation to the query so that it is passed by the data source.
A job which is built by the user will check the access policy document of every data source required. When the job is submitted and an action calls for access to a restricted data source, the job controller will attach the user's certificate plus authorisation credentials (basically a verification that they belong to a group allowed access to the data) to the query that is sent to the data source. At the moment this requires the use (in Globus) of proxy certificates - it is understood that this might be problematic in network circles - we will need to track this issue.
CAS: Community Management (CM) UI
Having spent some time discussing the way that CAS, the Registry and a Data Centre would interact, we looked at what the screens would look like for managing these areas. Firstly, the CM UI...
These screens would allow a user to select the community they want to address. On selecting a community, a simple panel would display the community metadata from the registry. The user can then choose to log in. If they are a member of the community, the screen will change to display the groups and members of the community, and, depending on the administrative level of the user and the rights that the community has assigned to ordinary users, will offer the option to:
- create a new group
- change the metadata for a group
- including option to inherit permissions from some parent group (ie create subgroup)
- remove a group
- suspend a group (so its rights cannot be used to access data)
- add users to a group:
- from within the community
- from outside the community
- remove users from the group
- look at resources to which a group has access
- add a new user to the community
- remove a user from the community
- assign CAS admin rights to a user
- check the event log for groups and users
Management of communities, groups and members is nothing new; we may be able to adapt existing software to our purposes and, even if not, can use existing software for ideas.
Resource Management (RM) UI
The RM screen will be used by those who own or control resources (data sources, applications, tools etc.). The RM component will be deployed along with a Registry replication node. Anyone who wants to maintain resources belonging to some community (eg a data centre, service provider) will log into the RM at the local node. The system will verify that they are a valid resource manager and display the resources of their community that they are authorised to manage.
The screen will list the resources and, in a separate panel, the properties (metadata) for each resource as it is selected on the screen. The authorised user can alter the metadata properties and can create/amend access policy documents, possibly by associating the resource with communities, groups and members from scrolling lists.
Most of the work on the registry is done through other screens. We could just about imagine needing a screen for disabling or deleting a resource which could no longer be accessed (and whose owners no longer existed). Perhaps a screen that simply listed resources and their metadata (though some resources might opt not to have their metadata on display, or even not to appear in the resource list at all) in an Explorer
The Registry is envisaged to work like the network DNS system whereby changes are replicated around the network within some period (24 hours?). Just so, registry nodes will replicate changes made locally round the VO.
Astronomical Tools UI
We discussed three different types of astronomical tools:
For these, we need to produce a toolkit which will help someone build a wrapper
for an existing tool. The wrapper would present a web service interface to the tool and allow it to be plugged into a portal.
Or the tool might make use of an XDisplay within an applet: for these types we need to look for some Java-based XServer tools.
has dropped that aspect of its project to deal with the development of server-based visualisation due to lack of resources. We could imagine three types of solution here:
- an Aladin-type of applet, but this requires downloading the data to the desktop
- a server which creates a JPG/PNG/GIF file and downloads that - might be slow to respond to dynamic parameter adjustment
- someone buys an SGI server and implements it on the VO!!
These tools will require cut-out services to be incorporated.
Client interface to VO
We will need to build some Java, C and Perl libraries for accessing the VO so that client-side tools can be developed according to the standards we specify.
Will also need libraries to get/put data and to convert between common data formats (these are under development now in other VOs).
Thanks to Keith for tracking these
- Data Sources
- How to bind to local and remote components when deploying VO software (so locally bound components can take advantage of faster messaging capabilities, remote bound will use messaging component - XMLBlaster possibly)
- Binding muliple data sources (remote and local)
- Dynamic binding by applications
- How will databases be replicated
- How might other data sources (FITS files, images etc) be replicated
- and what are the implications for data provenance
- How will web services and the portal be secured
- How will user certificates be issued and managed
- How will this work if AstroGrid doesn't supply them
- Will the registry contain details of both local and remote applications
- Will a registry agent add resource information
- Will such an agent change the metadata
- Design: hierarchical or relational
- Job Control
- How will jobs be interrupted
- How will jobs be run on other portals
- What is the lifetime of MySpace
- Can MySpace be made permanent
- What naming schemes will be used for users and groups
- Will there be a toolkit to assist portlet building
- Will portlets have a common look and feel
- How do we implement single-signon across multiple VO portals while retaining the option for certificate upload (certificate resides on own machine)?
- have the signon site (AstroPass?) hold the certificate
- how is this any better (ie more secure) than just logging in to portal
- allow service providers to choose whether they'll accept AstroPass signon or require certificated authorisation
- Language: how to build portal/app so multi-language sites can be catered for
- 21 Nov 2002
And now for a few pictures not really related to VO Usage: