r9 - 21 Dec 2005 - 05:51:54 - TWikiGuestYou are here: TWiki >  Astrogrid Web  >  DocStore > DataDocs > SuggestionsForADQL

Suggestions for Enhancements to ADQL

LIMIT n

I propose that an optional LIMIT n clause, where n is an integer, be added to the syntax. This purpose of this is to limit the number of rows returned by the query to the first n. There are at least two cases in which this is useful:

  • When doing data exploration, astronomers will often want to sample the effect of a query by getting a small number of results back, before repeating the query with no limit. Of course with a relational DBMS the order of rows is not something you can rely on, so any system of just returning the first N rows will return an arbitrary selection of them, perhaps those first loaded, or which happen to be first in the file.
  • Perhaps a more important use is when finding the top N in some sorted order, e.g. the top N faintest galaxies, or N stars with largest proper motion: in this case LIMIT n would be used on conjunction with an ORDER BY by clause.

Although this is not Standard SQL, many (most?) DBMS support this functionality in some way, for example:

DB2 SELECT columns FROM table WHERE ... FETCH FIRST n ROWS ONLY
Microsoft SQL Server SELECT TOP n columns FROM table...
Mysql SELECT columns FROM table WHERE ... LIMIT n
Oracle SELECT columns FROM table WHERE ROWNUM <= n
Postgres SELECT columns FROM table WHERE ... LIMIT n
Sybase-ASE SET ROWCOUNT n; SELECT ...

Notes added 11th May:

Wil O'Mullane points out that this is already supported in the latest ADQL definition, although the keyword TOP is used. I only chose LIMIT because it seemed slighlty more popular among existing DBMS, but TOP is just as good, and shorter.

Alberto Micol points out that ROWCOUNT in Sybase-ASE only works as expected without ORDER BY; with ORDER BY it only sorts the first n rows, not what one wants. There does not seem to be any simple way to return the top n or bottom n.

SELECT INTO newtable

The SQL language is designed to be result-orientated: a single statement can be quite complicated, and will return the results from a sequence of simpler operations. The astronomer typically works incrementally, selecting a subset from a large table, examining this, selecting a smaller subset from it (if the first selection was satisfactory), sorting the results in order, checking the top few results, and so on. This is the complete opposite of the SQL style. It may be more efficient to do everything in one statement, but when the exact sequence of operations is a novel one, getting it right may take some experimentation. It seems desirable to support this incremental use of databases.

The simplest way of doing it is to allow the user to create a new table from the results of a given selection (or join, etc.). The could be supported by allowing a SELECT statement to have optionally an INTO newtable clause.

Again (and rather remarkably) this is not supported by Standard SQL, but most DBMS support it in some way or another. For example:

DB2 CREATE TABLE newtable (col1 type1, ...); INSERT INTO newtbl SELECT ...
Mysql INSERT INTO newtable SELECT ...
Oracle ??
Postgres SELECT ... INTO TABLE newtable
Sybase-ASE SELECT ... INTO newtable

It is possible that we will want to extend the syntax to include a specification of an output file in some external managed filestore, such as MySpace.

EXPLAIN query

When working with large tables, it can be hard to estimate how long a given query will take. Depending on whether an index exists to accelerate the access or not, the time can be seconds or hours. To support this many DBMS support the prefixing of a value query by the keyword EXPLAIN which asks the DBMS to explain how it plans to execute this query. The results of the EXPLAIN command are very diverse, and may be a small table, or a text string. I suspect that it will be sufficient for an initial implementation just to return to the user the text resulting from the query, and have some guidance as to its interpretation in the help files or FAQ. Perhaps a more advanced system could try to interpret the results of EXPLAIN and transform them into a common form. This would be most useful if it could predict the likely execution time. Even without this, I think EXPLAIN would be a valuable facility.

DESCRIBE table

Before the user can formulate a sensible query, there must be some information available on the names of the columns and their data types. In practice is is also important to know which columns are indexed. We have been assuming that this information will be available from the VO Registry in some way, but that is only possible if there is some way for the Registry to extract this information from the DBMS. Whether the user gets the column details directly or indirectly, there is a clear need for a simple mechamism for obtaining it. Again there is (as far as I am aware) nothing in the SQL Standard to support this, but most DBMS support the functionality. For example:

DB2 DESCRIBE TABLE table
Mysql DESCRIBE table
Oracle DESC table
Postgres \d table (from the command-line)
Sybase-ASE sp_help table

Martin Hill has commented that this facility is not needed, as the functionality will be provided by a metadata query. If metadata queries do become standard within ADQL-speakign systems, then the DESCRIBE command will indeed be superfluous.

-- ClivePage - updated 11 May 2004

Edit | Attach | Printable | Raw View | Backlinks: Web, All Webs | History: r9 < r8 < r7 < r6 < r5 | More topic actions
 
AstroGrid Service Click here for the
AstroGrid Service Web
This is the AstroGrid
Development Wiki
This site is powered by the TWiki collaboration platformCopyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback