<< &< &> >> @ ~= &&The meaning of these operators is made clear in the tables in the User's Guide section 6.9. The only one of these operators which can be used on circles is
@, and then only to find whether a point is within the circle or not; there does not seem to be any simple way of testing whether two circles overlap. In view of this, I chose instead to use the box data type, where the operator && can be used to test for overlap. A few experiments showed that this did indeed do what we wanted, and that R-tree indexing could be used to conduct efficient range searches and spatial joins.
To use the box datatype it is necessary to draw a rectangular box around each error region, whether it be circular, elliptical, or whatever. The database, with the assistance of R-tree indexing, can find overlapping boxes easily, and this does the bulk of the work. A bit of extra calculation is then necessary to determine which of these overlapping boxes actually involves an overlap of the circles (or whatever shapes) that they have been drawn around, but this only takes a little extra computer time.
| column name | data type | physical units | Comments |
|---|---|---|---|
| ra | double precision | degrees | Right ascension of star |
| decl | double precision | degrees | Declination of star |
| poserr | real | degrees | Position error on above coordinates (assumed equal) |
| vmag | real | magnitudes | Visual magnitude of star |
| errbox | box | degrees | Rectangular box around the error circle |
CREATE TABLE mystars (ra DOUBLE PRECISION, decl DOUBLE PRECISION, poserr REAL, vmag REAL, errbox BOX) ;Notes:
ALTER TABLE mystars ADD COLUMN errbox BOX ;This seems to be a fast operation; I suspect the hard work of re-structuring each row happens only when the new column is populated with data.
INSERT INTO mystars (ra, decl, poserr, vmag) VALUES (12.30, -4.56, 0.01, 12.9) ;A single INSERT statement can insert values into many rows, the manuals explain the syntax. When loading bulk data, however, it is much faster to use the COPY command to load from a suitably formatted external file. Postgres has a defined binary format, but it is somewhat difficult to construct, and I always loaded text files, since they are easier to construct and to check. The default is for fields to be separated by tabs, but an alternative field separator can be specified. The default value for null fields is "\N" but again an alternative can be specified. The equivalent COPY command for a tab-separated text file would then be something like this:
COPY mystars(ra, decl, poserr, mag) FROM '/home/cgp/data/mystars.tsv' ;In both INSERT and COPY commands you do not need to list the column names if you are inserting or loading data into all columns. There are two ways of filling the errbox column with the spatial data. If you are constructing the text file yourself you may wish to put the error-box data in it as well. There is a defined format for boxes, each corner is given as an (x,y) pair, with a comma between the two values, one for each opposite corner, and another pair of parentheses around the whole thing. To make this clearer in the example below I have used the vertical bar symbol as the field separator, then a line in the text file might look like this:
12.30|-4.56|0.01|12.9|((12.29,-4.57),(12.31,-4.55))Note: here I have simply added/subtracted the error-radius, 0.01, to/from the RA value. To be correct we need to take into account the distortion of the scales at non-zero declinations. Then the copy command would look like this:
COPY mystars FROM '/home/path/mystars.csv' WITH DELIMITER '|' ;The alternative is to populate the errbox column with an UPDATE statement, calculating the box corners from the (ra,decl) value and the error-radius. This is slightly messy as one has to use the point and box functions, e.g. like this:
UPDATE mystars SET errbox = BOX(POINT(ra+poserr/COS(RADIANS(decl)),decl+poserr),
POINT(ra-poserr/COS(RADIANS(decl)),decl-poserr)) ;
Here I have made a crude allowance for the fact that the length of the box along the RA axis needs to be streched at declinations away from the equator.
Then what you get from a select statement is:
cgp=# select * from mystars; ra | decl | poserr | vmag | errbox ------+-------+--------+------+--------------------------------------------------------------------------- 12.3 | -4.56 | 0.01 | 12.9 | (12.3100317540404,-4.55000000022352),(12.2899682459596,-4.56999999977648) (1 row)
CREATE INDEX myindex ON mystars USING RTREE(errbox) ; ANALYZE mystars ;
SELECT ra,decl,vmag FROM mystars WHERE errbox && BOX(POINT(179.0,1.1),POINT(179.1,1.2)) ;In a similar way you can do a spatial join between two tables which have compatible columns of spatial type. It is only necessary to have an R-tree index on one of the tables, preferably the larger one. Suppose you have two tables called mystars and usno, the join syntax might be something like this:
SELECT mystars.ra,mystars.decl,usno.ra,usno.decl FROM mystars, usno WHERE usno.errbox && mystars.errbox ;If the columns have unambiguous names you do not need to precede them with the table name. This selection will return rows where the rectangular boxes overlap; if you want only those where the circles overlap, within limits of the two error-radiii, then a bit more trig is needed. In the case of astronomical or geographical coordinates you need to compute the great-circle distance, not the simple cartesian one. Unfortunately the simplest expression for great-circle distance does not work at all well for small distances; a better one is that based on the haversine formula. This is used here:
SELECT mystars.ra,mystars.decl,usno.ra,usno.decl FROM mystars, usno WHERE usno.errbox && mystars.errbox AND DEGREES(ASIN(SQRT(POW(SIN(RADIANS(mystars.decl-usno.decl)/2) + COS(RADIANS(mystars.decl)) * COS(RADIANS(usno.decl)) * POW(SIN(RADIANS(mystars.ra-usno.ra)/2)))) < mystars.poserr + usno.poserr ;
UPDATE statements which populate the errbox column will cause every row to be duplicated. There seems to be two ways of recovering this space VACUUM FULL operation: this actually releases space, whereas a VACUUM alone moves it to the end, where it could be used only by new records in the same table.
CLUSTER command, which creates a new table (and index) and deletes and renames the old one.
ALTER TABLE ADD COLUMN option, then an UPDATE to populate the column. Then either CREATE INDEX to create the R-tree, and then CLUSTER on that index to produce a table copy.
VACUUM FULL to remove the obsolete rows, then CREATE INDEX on the clean version, then optionally CLUSTER on the result.
CLUSTER may be faster than VACUUM FULL for large tables.
What is clear is that it is better to load the table with a ready-populated errbox column, as populating this afterwards causes a lot of disc space to be wasted, and recovering it is tedious.
> creatdb mydb
> psql mydb
Welcome to psql 7.3.4, the PostgreSQL interactive terminal.
Type: \copyright for distribution terms
\h for help with SQL commands
\? for help on internal slash commands
\g or terminate with semicolon to execute query
\q to quit
mydb=#
At the psql command prompt (which reminds you of your database name) you can enter the following. It is sensible, of course, to have the CREATE TABLE command in a text file, which can be loaded with just \i filename
CREATE TABLE twomass (
ra double precision,
decl double precision,
err_maj real,
err_min real,
err_ang smallint,
designation character(17),
j_m real,
j_cmsig real,
j_msigcom real,
j_snr real,
h_m real,
h_cmsig real,
h_msigcom real,
h_snr real,
k_m real,
k_cmsig real,
k_msigcom real,
k_snr real,
ph_qual character(3),
rd_flg character(3),
bl_flg character(3),
cc_flg character(3),
ndet character(6),
prox real,
pxpa smallint,
pxcntr integer,
gal_contam smallint,
mp_flg smallint,
pts_key integer,
hemis character(1),
date date,
scan smallint,
glon real,
glat real,
x_scan real,
jdate double precision,
j_psfchi real,
h_psfchi real,
k_psfchi real,
j_m_stdap real,
j_msig_stdap real,
h_m_stdap real,
h_msig_stdap real,
k_m_stdap real,
k_msig_stdap real,
dist_edge_ns integer,
dist_edge_ew integer,
dist_edge_flg character(2),
dup_src smallint,
use_src smallint,
a character(1),
dist_opt real,
phi_opt smallint,
b_m_opt real,
vr_m_opt real,
nopt_mchs smallint,
ext_key integer,
scan_key integer,
coadd_key integer,
coadd smallint
) WITHOUT OIDS;
The "WITHOUT OIDS" option makes for a slightly more compact table, I think, and has no ill-effects that I can discern.
These column names are those listed in the files which come with the 2MASS distribution from USNO. The meaning and units can be found from the 2mass web-site. Another useful source of information is [[http://vizier.u-strasbg.fr][Vizier] where the ReadMe file (see link in the coloured bar near the top) gives details. If you check the UCD box, you can see the Unified Content Descriptor (UCD) of each column too.
> zcat psc*.gz | psql -c "copy twomass from stdin with delimiter '|'" mydb
ALTER TABLE twomass ADD COLUMN errbox BOX ;
UPDATE twomass SET errbox =
BOX(POINT(ra + err_maj/(3600.0*COS(RADIANS(decl))),
decl + err_maj/3600.0),
POINT(ra - err_maj/(3600.0*COS(RADIANS(decl))),
decl - err_maj/3600.0)) ;
If you think these limits, set by the err_maj column, are too conservative, then modify the formula accordingly. It is sensible to have the minimum bounding rectangle in the errbox to be larger than any likely search radius that you want to use, and efficiency does not suffer all that much if it is a bit larger than necessary (up to the point that significant numbers of these boxes overlap).
ALTER TABLE twomass ALTER COLUMN errbox SET NOT NULL; -- only needed for clustered index CREATE INDEX tbox ON twomass USING RTREE(errbox); CLUSTER tbox ON twomass; -- only needed to cluster the index ANALYZE VERBOSE twomass;The ANALYZE command is needed after any index creation, or any significant change to one. The VERBOSE is optional, but gives some information on the size of the index.
SELECT COUNT(*) FROM twomass WHERE BOX(POINT(123.4,45),POINT(123.9,46)) && errbox ;
CREATE INDEX tdec ON twomass(decl);If you do this after the ANALYZE command, then you will need to run ANALYZE again. The advantage of a dec index is that it supports simpler queries like this:
SELECT COUNT(*) FROM twomass WHERE ra BETWEEN 123.4 AND 123.9 AND decl BETWEEN 45 AND 46;But these will execute more slowly. It is instructive to compare the performance of these indices: the dec index always performs at much the same speed, as it has to physically retrieve lots of rows from the table, and then throw most of them away when it later filters on RA. The spatial index gets faster if it is re-run, as most of the necessary rows are already in the disc cache.
![]() |
Click here for the AstroGrid Service Web |
This is the AstroGrid Development Wiki |
|