Analysis of Cea Descriptions in the Registry.
As part of designing and implementing an improved UI for launching CEA applications, I gathered some statistics from the current registry about the current set of CEA applications.
Sizes
I ran this query
cea-paramsize.xquery against Galahad using AR. It generated these results:
size.html.
Number of interfaces
An application can provide a number of
interfaces. Each interface is a set of input and output parameters which can be used together to invoke the application.
The vast majority of applications have one or two interfaces. The amount having 2 interfaces was a little less than the number having a single inteface. The maximum number of interfaces provided by a single application is 10.
Number of input parameters.
The majority of applications have between 1 and 5 inputs. Up to 10 parameters is quite common. The highest number of inputs is 30.
A few have no input parameters - but I suspect these might be just be testing apps.
Number of output parameters.
Overwhelming majority of applications have a single output. Up to 4 parameters is quite common. The maximum number of outputs is 16
Implications for UI
The UI should work best with the average sizes found, but must also work acceptably with the maximum values found. The target application provides a choice of 2 interfaces, with up to 10 input parameters and 4 output parameters
Parameter Types
The registry schema for CEA allows each parameter to be described using the following attributes:
-
name - used internally to refer to the parameter from interfaces
-
type - The type of this parameter - can only be one of the types defined in the CEA specification: integer, real, complex, double, text, boolean, anyURI, anyXML, VOTable, RA, Dec, ADQL, binary, FITS
-
sub-type - some additional type constraint for this parameter
-
accepted-encodings - information about what representations / formats are acceptable
-
UI Name - A human-readable name to use in the UI
-
UI Description - A human-readable description of the parameter.
-
UCD - a ucd that describes the parameter
-
Default Value - a default value to be displayed in the UI.
-
Units - some descirption of the units that the parameter value is expected to be in.
-
Options - An enumeration of permitted values. If provided, the parameter must be one of these values. If unspecified, the parameter can take any value.
I wanted to see how applications were describing their parameters - what parameter types were being used, and which of the other attributes. I ran this xquery
cea-survey.xquery against Galahad using AR. The results generated are
result.html.
Types
From a UI point of view, the most useful attribute of a parameter is it's
type. Using this, a UI can display different widgets to edit that parameter. For example, I'd planned to display a
- astroscope-like input widget for RA,Dec values - which would convert between decimal degrees and sexagesimal, and convert object names to positions.
- checkbox for a boolean parameter
- text entry box, and verify that a number is entered for one of the numeric types
- larger text field for the anyXML and VOTable types
- file-chooser to select an external file for the binary and fits types.
However, the current cea registrations suggest that this won't work. Here's the number of parameters of each type in the registry at the moment:
| type name | occurences | notes |
text | 659 | overused, inputs mostly, used in output when there's a type union |
double | 363 | inputs only |
ADQL | 90 | '' '' |
binary | 62 | inputs and outputs |
integer | 60 | inputs only |
FITS | 52 | input and outputs |
boolean | 45 | inputs only |
VOTable | 39 | inputs and outputs |
Dec | 3 | inputs only |
RA | 3 | '' '' |
anyURI | 3 | |
real | 3 | inputs only |
anyXML | 0 | never used |
complex | 0 | '' '' |
text is overused. Often this is just plain incorrect (a parameter called
Images who's description is 'A votable of image details'; a parameter called
numPoints , description - 'the number of points: 2 or 3'). In other cases
text is used to point to an external file or resource - here
anyURI would be better.
Sometimes
text is used because the parameter doesn't fit into one of the the other formats - for example dateTime parameters; a parameter called
POS which is a comma-separated RA and Dec occurs often (although, why wasn't this represented as a separate
RA and
DEC parameter?). In other situations
text is used when the parameter has a varying type - for example the
Result output parameter, which may be a votable, or csv, depending on what inputs were provided.
This suggests that a future UWS replacement for CEA should either have a richer type system (that allows dates, positions, and maybe composite and variant types), or it should drastically simplify the type system - to just boolean, string, number. The current types certainly aren't being used fully.
The
double type is used 80 times for a parameter called
RA and for a parameter called
DEC. These would need special treatment to detect when to display a coordinate-input widget rather than a standard input widget. Wonder why 'RA' wasn't used instead.
Applications which expect an
ADQL parameter are handled by the
QueryRunner - so not an issue here.
The
integer type is sometimes used for 0/1 flags- where a boolean parameter would be better.
RA,
Dec are hardly used. Don't understand why this is. Maybe the spec is unclear as to whether this means sexagesimal or decimal degrees - so application deployers prefer 'double' - which forces the representation to be decimal degrees.
Other attributes
name is required, and always occurs.
The
accepted encodings attribute was never used.
sub-type was only rarely used, and when it was it noted java type information which (e.g.
java.lang.String for a parameter of type
text) which didn't constrain the type further.
UI Name was used fairly well - some of the parameter names are quite long, but this is maybe unavoidable.
UI Description was well used. Long texts are common. Often contains format, units and type information. Sometimes there's a bit of html formatting in the description.
UCD occurs infrequently - UCD1 and 1+. It's main use is to mark that a parameter of type
double is really a RA by using the UCD
POS_EQ_RA_MAIN
Units a few occurences of 'arcsec','deg','Solar Masses','Myr'
Options not used as much as they could be. Instead UI Description is relied upon.
Implications for UI
It's doubtful that the registrations for these applications will improve quickly, so the
TaskRunner UI must make the best of what it's given. Ho hum.
- So many parameters are poorly specified that using
type by itself to display different input widgets or input constraints won't work.
- UI Description is integral to usability - as it's often the only guide provided for that parameter. So need to have this displayed prominenetly - a tooltip won't do. Tricky as it's of unlimited end.
- UI Name should be able to display long parameter names (up to 35 chars)
- No point using
accepted-encodings or sub-type
- Could maybe use UCD as a hint for when to display a coordinate-input widget. Only possible if both a POS_EQ_RA_MAIN and POS_EQ_DEC_MAIN are provided.
- Unsure what to do about the
POS attribute - no foolproof way of detecting this.
- Maybe include a separate coordinate conversion / object name resolving widget in DesktopAstroGrid - this is used to get the correct values, which can then be pasted into ad-hoc fields like
POS.
- As types are so misapplied, probably can't check the input and reject syntactically invalid entries. Best that could be done would be to display a non-interrupting warning when something unexpected is entered (a float for an integer, for example.)
--
NoelWinstanley - 03 Jul 2007
Comments
Main.TonyLinde [20070704] : my original idea for app UIs was that the descriptor of an application (registry entry probably) would contain an XForms/CSS description of the interface for the application. This would allow a rich interface designed by the person who knew the application best, its creator. I don't know if there are Java libraries that will interpret and display such interfaces but I still think it is worth looking into.
Main.GuyRixon [20070704] : I think we can improve a lot of the descriptions. It's actually a formal task for C6 to improve the registry. Most of the descriptions were registered by
AstroGrid staff, so we have control. Others were harvested from registries that have gone away (tests and workshops) so can be deleted. Most of the live ones registered by "customers" are DSA applications. I predict that the latter will get updated soon when we release TAP support; I think that will be a popular upgrade to DSA/catalogue.
Once we have cleared out the dead registrations, I think it's worth going through the rest and emailing the contact persons if they are wrong.
For the future, I think if the UI had the type checking and specialized widgets, then the persons registering the applications might be inspired to put correct metadata. At the moment, when they test with the workbench, they see no gain to putting the extra metadata and no loss when they get it wrong. If they thought they should have a coordinate-entry dialogue and they don't get it then they would probably check their installation.
Main.NoelWinstanley [20070704] : Guy - yep, you're right - Hopefully using the metadata more in the UI will drive improvements, and make it easier to see what people need to add. I was only querying the registry for
active records - I'm surprised that this includes registrations from tests and workshops.
Tony - it's a most sensible idea, but we're miles away from this now. As well as an XForms 'player' in the client (and I can't find a xforms->java swing library that's still alive), we'd need to provide an XForms interface designer which application creators could use. It's a pity that xforms isn't mature enough yet.
Main.PaulHarrison [20070705] : This is an extremely useful/interesting analysis. First some general points
- The schema are more than 3 years old still basically at the 0.2ish "first try" version, and it shows - there is a window of opportunity with the registry upgrade that is happening to move this to a version 1.0 that has some of the faults of the schema removed. It is better to try to make simultaneous changes to the schema and the UI to make this work better than only to patch up the current UI to work around some of thee problems. In fact it is only when actually trying to create the UI that most of the features of the CEA Application definition really get exercised.
- I have been sat on a v1.0 schema for some time that is viewable at http://www.eso.org/~pharriso/ivoa/schema/CEABase.xsd. This is a result of my trying incrementally to improve the schema over the last 2.5 years, as issues arose. I have tried to keep a fair degree of backwards compatibility, whilst at the same time trying to remove inconstencies, and respond to suggestions for additional features. It would have been too disruptive to try to introduce this earlier, but with the decision to drop JES and the V1.0 registry upgrade and the Workbench upgrade, there is a "window of opportunity".
I originally hoped that CEA could become a sort of meta-language for
describing the interfaces/parameters of other IVOA standard services, so that users might be familiar with the concepts,
but as
it is clear that this has not/will not happen, then I think that the UI
should try to avoid any mention of CEA as a term (including finding what are "pure" CEA applications). All that the end user cares
about is that they are sending some parameters to a "service" - this was the
original inspiration for CEA anyway - to abstract away the underlying implementation
details
of how the parameters actually got to the application.
Some Comments on Noel's analysis.
CEA is undoubtedly underdocumented which is an unfortunate consequence of
the timing of my move to ESO. There is almost no advice to authors on how to
create an application description, and so it is not surprising that there are
many poor descriptions. It is also true (as Guy pointed out) that the old Workbench UI (which was
essentially identical to the Web Portal UI) did not help in that it
did not make any difference what type was chosen for the
parameter. This is one aspect of the UI that was too "close" to the
underlying schema, in the sense that it just mimiced the structure of a tool
document, where every parameter value is simply a string. This was never the
intention in CEA, the UI was always supposed add richness to its behaviour
that could be inferred from the application description metadata. Another
area where the current UI is too "close" to the schema how the user specifies the repeatable/
indirect qualities of a parameter - different visual cues to indicate these
properties and different gestures to change them than clicking on a checkbox
would be preferable in a new UI.
The CEA "type system" was always a rather loose thing - however there was
always an intention to try to allow the type to convey some information
about the "container" that the data lived in, as well giving sufficient
information to allow the user interface to be able to perform some
validation/user assistance. In fact the CEA type is probably a little
closer to the "serialization" concept in the IVOA data models.
RA and DEC - are a good use case - They are "types" that have some very
particular "astronomical" semantics which was why I introduced them in
CEA - however, I have decided to drop them in the V1.0
schema, as they were obviously too specific. In fact if the application
is expecting an RA as a floating point value then I think that the "best
practice" is to signal that via the appropriate UCD. There are however
cases where the application expects a sexegesimal string, and for those
I have introduced the "angle" type, but again if that angle is intended
to be an RA then it should be signaled with the UCD.
Dates are another useful "type" that was stupidly entirely absent from the
original CEA - they are in the current draft in two forms "date" and
"mjd" - it is clear just thinking about this posting that mjd probably
does not belong as a "type" - it should just be a double with an MJD UCD
if such a thing exists.
misc comments
- interfaces - clearly if there is only one interface to an application the UI should not even show that there is a possiblity to choose.
- Another feature that the UI could offer by utilizing the UCD defintions for a parameter is to store the 'current' (i.e. last used) value for each UCD type and offer a shortcut to set a parameter to that value.
GUI design/Customization
Xforms representation for the has also been at the back of my mind for a
while, but as Noel has pointed out all of the java XForm effort seems to have
shifted towards producing a server side engine with javascript front ends.
Anyway even XForms might not be the what people want.
It is informative to see what VO theory people have been doing in this area. I
think that Frank Le Petit's
http://www.ivoa.net/internal/IVOA/InterOpMay2007Theory/VOParis_Simulation_IVOA07.pdf work is one of the best endorsements for AG as a whole.
They published their code as a CEA service, and then used the AR to allow
them to write a custom UI to run the application. The sorts of things that
they did would not be possible with Xforms anyway as it involved creating
plots to help choose the input parameters, and when I spoke to him about it
Frank that he would always eventually want to write a custom GUI even if the
"standard" Workbench GUI were more configurable. This does not mean of course
that the "standard" workbench GUI should not be better than it is at the
moment.
In addition the group of Fred Boone
http://www.france-ov.org/twiki/pub/GROUPEStravail/WorkflowReunion2/fredboone.pdf
has some theory code for which he decided to write a "CEA-like" description schema -
it contains some useful ideas, and certainly has a more sophisticated
auto-generated GUI than the CEA gui currently in workbench. e.g.
- The ability to indicate tabs for groups of parameters.
- representing groups of repeatable parameters as tables.
The new CEA schema introduces a structuring of parameters into
that although
originally intended as simply a way of grouping sets of repeatable parameters, could
also be interpreted by the GUI for the case where
- there was no repeat, as just a tabbed group of parameters
- where there was a repeat, as a table structure.
However, this is probably for the next version of the GUI
-- PaulHarrison - 05 Jul 2007