r15 - 27 Feb 2007 - 12:59:37 - TonyLindeYou are here: TWiki >  Main Web  >  TonyLinde > TonyOnDeepReview

AstroGrid Deep Review: Tony Linde's Thoughts

Introduction

In January 2007, AndyLawrence announced that the AstroGrid project would undertake a FitnessReview of how it is (or why it is not) being used within the UK. There are two strands to this review:
  1. user survey
  2. deep ergonomics review

The user survey (AstroGridUserReview) strand is led by NicholasWalton and is addressed at people who have been exposed to the AG system (primarily via the AgWorkshops). It will consist of a detailed questionnaire which all those who currently are registered on the system will be asked to complete followed by a few one-to-one interviews.

The deep ergonomics review (AstroGridDeepReview) is led by AndyLawrence and meant to 'critically analyse how users interact with the current system; brainstorm proposals on how this could change; analyse pros and cons of possible changes'. The reason behind this review is the recognition that even though we have over 400 persons registered as users of the system, few of them continue using the more radical aspects of it (see Ref 01), even after attending workshops (and this even though reports from the workshops seem to be positive).

I've been asked to serve on the panel for this second strand and I'm using this page to collect my thoughts together (none of this should be taken in any way to reflect the opinions of the project, nor of anyone else working on it, nor anything but my own interpretation of events and my own desired direction for the future of the project).

Evolution of UK-VO

The perception of AstroGrid has changed over the years. In the first (April 2001) proposal for the project (02), the goals encompassed:
  • working datagrid
  • datamining facilities
  • uniform archive query
  • tools for on-line analysis, database analysis and exploration
  • techniques for resource discovery

As can be seen, these are predominantly data-oriented, prompted by the imminent arrival of vast amounts of data from new missions. Computation was also addressed in the need for data mining facilities, including remote computation. When I joined the project in Nov 2001, the AGLI accepted my idea that none of the above could be delivered without first building the infrastructure to support it. This meant the ability to identify and authorise persons to do specific tasks or access specific data, to construct a flow of tasks for remote execution (workflow), to provide offline storage as well as uniform querying, data mining etc.

This approach resulted in the architecture outlined in the Phase B proposal (03). This document stated as its end goal 'to produce software which will enable the creation of a working, grid−enabled Virtual Observatory (VO) based around key UK astronomical data centres.' The focus moves from actually creating the VO to that of enabling the creation of the VO via the software infrastructure which was the end goal of Phase B. The project still had the goals of providing a working datagrid and all the above features but the focus on infrastructure was there.

After the first AG project, the proposal for AG2 was submitted (04). This intended to 'develop a future Virtual Observatory infrastructure for the UK that delivers powerful analysis facilities, is matched to key facilities and missions, is integrated into the European scene, and backs UK data centres in international competition.' It included the intention to work on:

  • the core infrastructure
  • science user tools
  • automated resource discovery techniques
  • techniques in Grid technology, visualisation and datamining
along with funding for a data centre alliance and outreach activities. These latter were not funded and nor was the work on the tools, resource discovery and data mining (these later formed the foundation of into the VOTech project). The reduced funding (reduced from what was asked for - it was actually commensurate with AG1) meant that little more than the infrastructure could be delivered. The project has delivered some useful science tools - mainly by wrapping existing command-line-evocable tools - and a client-side workbench which demonstrates the functionality of the infrastructure. Some other client-side tools (TopCat, GAIA etc) are being made VO- and AG-aware.

The project has just submitted a bid for further funding (document not publicly available), with the intent to 'construct a fully operational and sustainable VO service for UK astronomers'. The focus of the proposal is still on the AG infrastructure: providing a UK-VO service based on that infrastructure; minimal funding is sought for new tools.

Conclusions

The Virtual Observatory requires more than the existing web infrastructure and it is this that AstroGrid provides. The project has done a tremendous amount of work in this field - work that is now being recognised as revolutionary by other VO projects. Most of the infrastructure has been available for over 18 months and has been extended and made more stable over that time. By the end of 2007, it will be complete in a robust, performant and scalable fashion and available for rollout to service centres from 2008 onwards. In addition to the infrastructure, there will be a client-side UI (the AstroGridWorkbench) and a few science tools available as server-side applications as well as sample workflow scripts.

My View

It is my opinion that AstroGrid is about infrastructure: it provides the software to create a VO infrastructure, it has deployed that software to provide VO services and it will support and maintain those services in a production environment from 2008 onwards. But, just as a real observatory is about more than the instruments which are designed, built and maintained by the original project team, so the VO is much more than its infrastructure components. A real observatory project generates data products and makes them available via some interface (web page, email etc). Scientists then use those products or subsets of them (via queries): they subject the data to various types of analyses, plugging it into tools of different types, running the results through other analyses; they visualise the data and the results or plug it into statistical packages; they write their own programs or scripts to automate these processes; they take the data or the results and run them against other sets of data. Basically, they do thousands of different things to the data, all different to the scientist sitting alongside them.

The idea that AG can meet the needs of every astronomer in the UK with two or three wrapped tools every six months and the workbench is not tenable. How many separate routines are there in the Starlink, Solarsoft, IRAF etc libraries? And what percentage of the typical astronomer's daily work is met by these libraries? To meet the needs of the UK's astronomers, a vast amount of software needs to be made VO-aware. And then think of how much more is possible using the facilities provided by AstroGrid. What new ways of analysing data might be provided to astronomers if tools writers were to have access to the VO infrastructure provided by AstroGrid?

AstroGrid will never have more than a handful of users until the tools that astronomers want to use are made capable of using the AG infrastructure. And the VO will never take off in the UK until new tools that fully exploit the AG infrastructure are developed by astro-developers.

Counter views

My view is not one that is held by everyone else in AstroGrid (perhaps by no-one else).

One view is that AG must provide its services to the UK astronomer community in order to kick-start the VO. But as we've already seen, AG is simply not gaining serious users, no matter how many go through the workshops. The idea is that 'fixing' the AstroGrid UI will bring astronomers flocking to the VO. But that presupposes that the half-dozen functions provided by the workbench is all that astronomers in the UK need - not, in my opinion, a tenable idea.

Another view is that astronomers are not using AG because it is not stable enough. Again, the software is stable enough that if it was providing a useful service to astronomers then they would persist with it. How long did people stick with early versions of Windows even though it died several times a day on them? AG does not do what astronomers want. The fact is that people are simply not using the software at all, not using it and finding it failing: if the latter was the case, we'd be inundated with calls from astronomers asking for problems to be fixed - that is not happening.

Back to me

Although I do not agree that the UK-VO service has to be provided by a central project, I can see that it can work: AstroGrid-3 can both maintain the infrastructure software and operate it as the UK-VO service. But, this will not get people using the service. The infrastructure service is one that will be used by other services, not by astronomers. And the few user interfaces and tools provided (and that might be provided) by the project are hugely insufficient to meet the needs of the UK's astronomers.

Until there is wide effort to engage with UK astro-developers and until PPARC funds those people to deliver the tools and interfaces that astronomers want, the UK-VO is going nowhere.

The Deep Review

So, what of the Deep Ergonomics Review? IMO I think we need to (at last) find out what it is that astronomers want. We need to catalogue what it is that the variety of UK astronomers do (05) and list those tasks which they currently do by computer which might be enhanced by AG services. We should then imagine how we might automate (with the benefit of AG services) those tasks which are currently repetitive or difficult or which require special expertise.

But the starting point ought to be:

What do astronomers do?

Starting point, I guess, is how do we find this out? Some ideas:
  • ask the science team: I tried to do this at TonyOnAstroWorkplan but only Anita responded. My motivation for this page was to provide semantic add-ons to the existing system - and then used it to inform the AstroVRE proposal. But it still might be worth expanding it.
  • ask the tech team: shock, horror: techies may not be able to say what users want but they can have a better idea of how to deliver it than the users or the science team: 30 years of building user interfaces are valuable!
  • ask our own colleagues: I did have an idea of doing this with the Leics astros: invite them to a 1-2 hour session, spend 10 mins explaining what the VO was about (not AstroGrid, no demonstrations) and then ask them to imagine what they might do with VO facilities were they available (I'm talking of generic facilities such as access to any data via a common interface, offline storage, workflow, server-side apps, HPC power etc).

Then what? What do we want to know? If we don't take the very generic approach I outline above, we need to ask some more focussed questions:

  • how do astros look for data
    1. do they look for databases or do they already know which database they want to use
    2. if they look for databases, how do they look:
      • ask a friend
      • google
      • search the literature
    3. and what criteria do they use to search
      • wavelength, coverage, ...
      • mission, archive, ...
  • and when they've found a database, how do they find the data of interest
    1. simple query
    2. cross-matching

  • and what about tools: how do astronomers find the right tool to analyse, manipulate or view their data
    1. as above: ask a friend, google, literature
    2. check the main libraries: Starlink, Solarsoft, IRAF, ...
    3. write their own

But if I am wrong...

Hard to imagine ( smile ), but what if my analysis of the AG situation is wrong and we do only need to tweak the UI, add another function or tool and the astronomers will flock to use AG. Here I'll add my own ideas of what is wrong with the current system and what we might do to get astronomers using it.

Portal

OK, this is cheating and owes more to my vision of letting astro-devs loose on the AG system, ie develop a web-based portal which others might customise to provide their own front-end to the AG components. But maybe astros do prefer to use web-based interfaces to do certain things such as getting data. There's no doubt that Taverna will provide a much richer way of building a workflow but for all other VO tasks - querying databases, checking job status, myspace management, ... - maybe a web interface is better.

Anita made a comment about this the other day (Day 1 of the C05 Tech planning meeting), viz that if we had a server-side version of the AstroRuntime, a data centre, on their website, could then offer two ways to a user of getting data: the normal 'query and download' or a 'query and post to myspace' option. This might be a good way of getting astros at least aware of AG.

Registry

Finding stuff in the registry is difficult but this has been acknowledged for some time. Perhaps RegistryScope will solve the problem but let's dig a bit deeper into why anyone would want to use the registry. Basically because they want to find something: data or application, but probably mainly data. I must say I'm struggling here since I'm not a working astronomer and don't know what an astro would want to search the registry for or how (one topic for the Deep Review above: done).

One approach would be a more semantically enriched registry - the goal of VOTech DS5. Will chase this up.

Certainly, for the task launcher or workflow builder, it might be useful to let the user narrow what they want to search the registry for before popping up a simple search box. Perhaps, at the least, let them decide first if they want to create a data step or application step and then filter the search for datasets or apps. More difficult but perhaps useful for beginners, have a 'wizard' which steps the user through various filtering steps (data->wavelength->coverage; application->?): thinking about it this can probably be done in a simple form.

MySpace

When I was trying to put together the AstroVRE proposal, one comment I received from the SWIFT team was that the tool that would most benefit them would be an online storage area for datasets, results etc which was available project-wide or could be shared between two or more people. Perhaps if MySpace allowed users to share folders with others (individuals or groups), and/or if groups (eg a group which contained a project's members, with subgroups for task members) could 'own' an area of myspace just as individuals can, this would garner more interest.

Workflow

One thing I've heard many times in the past five years is that astronomers tend to pick a bit of data, run it through some tools, pick some more data and do the same, run the results through more tools and so on. We need to fit, I think, the workflow concept around this way of working rather than expect astros to modify their way of working to our concept. This is not to say that sticking with the same concept and making the workflow tool easier to use would not reap benefits: I'll explain this a little better before moving on to the more radical approach.

We might make it easier or more intuitive for an astro to create a simple job of getting data and running it through a single tool. For instance, browse the registry, select a data source, right click and select an option to 'create job using this data'. User is taken into workflow with first step of getting data already created. They just need to fill in the query and add the analysis tool and then run the job. They can then check the results, come back to the workflow screen, modify the query or pick a different dataset (keeping the same query: need querying over data model or UCDs for this) and rerun. They can swap tools or maybe add in a different tool, building up the job, running it and changing it. Only when they're satisfied or want to go to lunch need they save the job.

I seem to have made the previous idea more radical than I thought. But maybe we could take the user out of the 'workflow' environment and put them into more of an experiment environment. This is another of the ideas I had for the AstroVRE: it works best from a browser where people expect and are used to a simple approach. The user does the same as above but the interface does a lot more remembering of what they have done and only presents them one step at a time. They have the option of stepping back to a previous stage, or combining results from two previous stages etc. But the user never sees a workflow building up: the user can choose to save previous steps as a workflow but otherwise they only see themselves working on one thing at a time (although we might offer separate context or experiment windows where they can work on two or more tasks at the same time).

Outrageous

These are the bits those with low blood pressure should avoid: ideas that question perhaps a little too much...

Wrong audience

What if AstroGrid is listening to the wrong people? The project has made a great play of being science driven. But maybe it shouldn't be. Perhaps the project ought to stop listening to astronomers and, instead, listen to what the data centres and tools writers want. I know from my own experience, that you do not give users what they ask for but give them what they need. The project seems not to have given astros what they need, so maybe it ought to focus on giving better tools and services to those who, in the past, have given astronomers what they need and leave it to those people to deliver the VO to the scientists.

Forget about applications

AstroGrid is attempting three pretty difficult tasks:
  • create the software for a scientific grid
  • deploy and operate a scientific grid
  • create all the user-facing software that makes up a "daily tool of choice" (06)

To do all of this and to do it well would take resources far beyond what PPARC is likely to assign to the project. Perhaps it is necessary to bite the bullet now, accept that we cannot do everything well, and commit to doing one thing well. My own preferred option is to do the first one well (accepting that we might need to do the second until existing resource centres pick up the baton).

But there is another option, that we take a vertical slice through the above and commit only to support the data side of the VO. So we only create software which will enable an Astro-DATA-Grid, deploy and operate that grid and write user-facing software which is the best in the world at allowing astronomers to locate and extract data and return it to their desktops. We'd have to forget about workflow (including job execution, scripting etc) and virtual storage and could get away with a reduced concept of auth/auth. We would focus on data access and registry along with all the issues around making them truly user-supportive, and in so doing, provide not just excellent software but an excellent service.

Solutions

I'll start this section off with just bullet points and try to expand them later:
  • provide tools for data centres to provide VO services from their web sites
  • persuade PPARC to fund a software development programme
  • tbc

Appendix A: Responses

A few comments on what others write during the review:

Andy

'We are BOTH an infrastructure project AND an operations delivery project, and ALWAYS have been'

No dispute there. But there is no reason why we also have to be the ones who write every piece of software that the users might ever need in their research. We cannot possibly do that and, in trying, will fail on the things that we are good at: infrastructure building and delivery. One would expect a new instrument project to also provide the software infrastructure that delivers the data to the astronomers but not to rewrite every piece of data visualisation and analysis software. There are people who do this better and as long as they have the tools and knowledge available to adapt or rewrite their tools, then they will do so.

This is what AstroGrid should be doing: providing a solid infrastructure, making sure that infrastructure is delivered and providing the tools writers with the knowledge and interfaces to let them do what they do best.

'Astronomers start by ... What we then want is for AstroGrid tools to pop-up when needed '

Follow-on from above - no, we should not expect them to have AstroGrid tools pop up: we could never write all the tools that epople will want to use after they have their data. What we should aim for is that the tools the astronomer turns to are VO-aware and, preferably, AG-enabled.

'Workflow builder, Query Builder, VO Lookout, whatever - these are all just specialised tools. ... Unbundle.'

Yes, these are things that use the infrastructure components to put the astronomer one step further along what they want to do. And maybe, we should spend time making such tools to demonstrate what can be achieved, but not too much time. There is still a huge amount of work to be done on making the infrastructure stable and improving it: spending too much time on tools detracts from delivery of the infrastructure. So, more time on JES (Taverna) and less on the workflow builder; more time on DSA and less on the query builder etc.

I completely agree with unbundling them from the infrastructure (technically I think they are anyway, but it needs to be done in terms of perception and ownership). But, as I've just said, let's go further and unbundle them from the project as well.

'we should be trying to be Intel rather than Windows. But of course we could be in danger of losing the branding'

So, let's forget the branding. I completely agree with Andy that we should be more like Intel than Microsoft. This is what providing an infrastructure is about. It is surprising how many people I speak to about my work with JISC who simply reply 'What is JISC?'. Yet, the universities simply could not work without JANET, access to resources through Athens (soon Shibboleth), etc. This can be a problem when lack of name recognition makes it difficult to secure funding but the fact is that the people who need to know do know about JISC: network admins know all about JANET, librarians know all about Athens, and the VCs know all about JISC and what it does with its top-sliced money. So, as long as JISC can prove it is value for money to those who use its services and who are then asked this question at funding time, it should be okay.

Astronomers do not need to know what infrastructure underpins the VO in the UK. As long as the data centre admins (and their managers) and astro-devs know what it is that AstroGrid does and are convinced that it is value for money, then we ought to be able PPARC to continue funding the service and necessary software developments. And, if we can persuade these people to add an 'AstroGrid Inside' badge to their portals and tools, then we'll get a bit of glory as well.

Norman

First up, I'll make it clear that I agree totally with what Norman concludes: that AG should stick to the infrastructure and PPARC should fund tools development by many and various means, not expect it from AG. My reasons are not that the AG developers could not build these tools but that they have more than enough work to do getting the infrastructure built, deployed, maintained and kept up to date and I think it is an inefficient and ineffective approach (pace Starlink) to channel all astro-development efforts through one group.

But to take issue with Norman on one point:

'Groups 1 and 3 could consult for ever... without producing genuinely attractive applications '

The point being that developers could never understand what it is that astronomers want and need. This is quite untrue. I've heard many times over the last 5+ years how astronomy is special, unique and could not be understood by developers. Tosh Even the comment itself is not unique: I've heard the same over the last 30 years from personnel and payroll people, financial and management accountants, bonds and currency traders, insurers, petroleum scientists etc. And they were all just as wrong. The job of an analyst/developer is to understand how their clients work and what they need to make some function more efficient. And, as long as the clients talk to them and give them access to the right people, then that is what gets done.

With AstroGrid, however, the job was to provide UK astronomers with access to the worldwide Virtual Observatory. But, in order to do this, since there was nothing already in place, we had in effect to first create a wholesale grid infrastructure for astronomy worldwide (yes, it was specified as UK but the UK could never work in isolation since most of the data and much of the applications were outside the UK). The AG developers have focused on infrastructure because this is what they were tasked to do, by me in the first instance and by the AGLI overall since they had agreed with me that this was a necessary first step.

Astronomy tools using the AG infrastructure have not been written, not because the 'Group 3' developers are incapable of doing so, but because they simply have not had the time to do so. Where they have had time to sit down with astronomers like Silvia and focus on tools then they have produced the goods. But I do not think the solution is to double the budget of AG so that they can do tools as well as infrastructure: we'd just end up with son-of-Starlink. PPARC has to produce an extra stream of money for individuals and groups to bid into for producing useful and meaningful tools for astronomy using the AG infrastructure. The wider spread and more competitive is the astro tools development base, the better will be the tools provided to UK astronomy.

Footnotes

01: This conclusion is dervied from usage statistics at http://software.astrogrid.org/launch-stats/ which shows a large number of registered users and significant activity in usage but, when looking at users who log into the system more than an average of once a week (which they must do to use workflow or myspace), we see very few serious users each month, and even fewer who appear in more than one month. For a rudimentary analysis of logins, see AGStats.xls.

02: See attachment to AstroGrid2Planning topic: Original AG proposal (MS Word document)

03: See Phase A Redbook (pdf document)

04: See AG2 proposal (pdf document)

05: One possible start might be the analysis I started (but which only Anita responded to) at TonyOnAstroWorkplan.

06: Note to AGSAG-FM11 on the Fitness Review: AGSAG-fitness-review.pdf

Edit | Attach | Printable | Raw View | Backlinks: Web, All Webs | History: r15 < r14 < r13 < r12 < r11 | More topic actions
 
AstroGrid Service Click here for the
AstroGrid Service Web
This is the AstroGrid
Development Wiki
This site is powered by the TWiki collaboration platformCopyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback