Guide to Celestial Coordinates, Cone-search, etc. for Non-astronomers
This guide will start out with very basic information but will get steadily more complicated. Please skip through the earlier parts if they cover things you already know about. I am writing them partly because, now that
AstroGrid is getting rid of almost everyone who understands these things, they really need to be set out somewhere.
Right Ascension and Declination
When we look at the night sky we just see points (and a few extended objects like the Moon) and cannot measure their distances, though they cover a very large range. So to a first approximation we can think of the sky as a perfect sphere, and just need a convenient coordinate system to specify the position of objects in the sky, assuming that they are all at infinite distance on the "celestial sphere". The term "fixed stars" may not be quite true, but apart from planets and other things within the Solar System, most objects of interest to astronomers don't move about except to a tiny extent. The coordinate system most often used is a system of spherical-polars, exactly analogous to those used on the surface of the earth.
- Right Ascension (often abbreviated RA) is equivalent to longitude on the Earth
- Declination (or DEC) is equivalent to latitude on the Earth.
Declination is measured relative to the celestial equator, which is just the projection of the Earth's equator into space, so it runs from +90° at the North Celestial Pole (not far from the Pole Star) to -90° at the South Celestial Pole (within the constellation of the Southern Cross). Just occasionally astronomers want to use a number scale which is just positive, and they use South Polar Distance (SPD) which goes from 0° to +180° but if you are lucky you will never come across this.
Right Ascension has to be measured relative to some zero point, the equivalent of the Greenwich Meridian. Back in the dark ages (before street lights were invented) astronomers decided to put the zero point at one of the two intersections of the celestial equator with the ecliptic (the plane in which the Sun and, on average, the planets appear to travel along). The chosen intersection is known as the First Point of Aries. It no longer lies very close to the constellation of Aries (but don't tell the astrologers as they get very upset when anyone points out these inconvenient facts).
Precession of the Equinoxes
The reason for the movement of the zero point of RA is that the Earth's pole is precessing or wobbling, with a period of around 26,000 years: this is known as the "precession of the equinoxes". The Earth's orbit around the Sun, however, is much more stable. As a result, the zero point of the coordinate system is continually moving. It is about as sensible as it would be to draw up maps of the Earth using a coordinate system not based on some fixed point like Greenwich but a on boat sailing down the Thames Estuary. This is a crazy system, but astronomers seem to be able to cope, more or less.
What it means is that to specify coordinates properly you need not only (RA, DEC) but also the date applying to the coordinate system, sometimes known as the "equinox". The most popular one at present is "J2000" - the J stands for Julian epoch, and the 2000 means the start of the year 2000. Until a few years ago measurements using B1950 were common, the B stands for the Besselian epoch, and refers to a Besselian Year (Google for the definition if you really want it).
If you have coordinates given in B1950 and you want them in J2000 (or vice-versa) then there are standard formulae (or bits of code) which can convert them for you. The Starlink library SLALIB, written by Patrick Wallace, is an excellent source of conversions like this.
Sexagesimal Coordinates
Just to make life difficult, astronomers are also rather partial to the use of sexagesimal notation, i.e. using base 60. These were invented by the Babylonians (see
http://www.spirasolaris.ca/sbb1sup1.html) who presumably had use of more fingers and toes than us. Thus instead of expressing a declination as 12.3456°, it would be expressed as 12° 20' 44" - or 12 degrees, 20 arc-minutes, and 44 arc-seconds. Of course the rest of us still have to put up with sexagesimals for time of day - the French revolutionaries tried to decimalise time after they had metricated other measurements, but for some reason it never caught on, unfortunately.
What is even more confusing is that when expressing Right Ascension astronomers still reckon in times, since before computers, the way to measure the coordinates of some object just passing through your field of view would have been to measure the time difference between it's passage and that of the First Point of Aries. Thus RAs are commonly shown
not as degrees, arc-minutes, and arc-seconds, but as hours, minutes, and seconds. As a result they have a range from 0 to 24 hours. To convert from decimal hours to decimal degrees is easy enough, just multiply by 15.
For both RA and DEC it is common to express the sexagesimal values as a string with colons (or sometimes just spaces) between components, thus 12° 20' 44" could be shown as 12:20:44. Declination values generally have a leading sign, even if positive, as it helps to distinguish them from Right Ascensions, which never need a sign.
Almost everyone who has ever written a program to convert sexagesimal coordinates to decimal ones has, at least once, produced code which falls foul of case of minus zero degrees. For example if you take "-00:15:35" and do the conversion in a simple minded way, the minus sign being attached to a value of zero degrees will be ignored, and the overall result will be positive when it should not be. You would be surprised to find how many published catalogues turn out to have an empty strip exactly one degree wide just below the celestial equator. Please try to avoid this mistake: libraries like SLALIB do these conversions efficiently and safely - if you must write your own routine at least copy the workings of a good one.
Oh, one other oddity: because the sky rotates east to west, the RA values increase towards the left, if you have the North Pole at the top. This is the opposite way from terrestrial maps. In addition, because astronomical telescopes don't have an extra lens put into to terrestrial telescopes especially to avoid this, they also invert their field. So when looking at the sky and north is at the top, east is also towards the left. Somehow it all makes sense, but maybe you have to stay up at night to appreciate it fully.
Direction Cosines
There is an alternative coordinate system, that of direction cosines. Effectively these are the vector components of the unit vector along three axes (x, y, z). Generally +y is to the Celestial North Pole, +x to the First Point of Aries, and +z at right angles to the other two, i.e. to RA 6 hours, declination zero.
These are supposed to be the components of a unit vector, so
x² + y² + z² = 1
Direction cosines are convenient for some types of calculation, by avoiding a bit of trigonometry, but are not very much used.
One problem in using them in catalogues of objects is that you need three columns of coordinates instead of just two, and you can't really find a value by inspection, since there is no obvious ordering of values akin to sorting on RA or DEC.
There is another minor problem: if you do a lot of calculations on these vectors, you may end up with an (x,y,z) triplet which is no longer normalised to unity, because of the accumulation of rounding errors. It's not always clear what you do then - scale all three components by the same value to restore the unit vector length is the common solution, but it's hard to justify mathematically.
Storage and Precision
Practically all computers now have floating-point systems conforming to the IEEE Standard 754. A single precision real number is stored in 32 bits, and has a 24-bit mantissa (23 bits plus sign). Because of a clever feature known as the hidden bit, actually you can use all 24 bits for the mantissa. The maximum precision of a single precision number is thus one in 1 in 2
24 or about 0.07 arc-seconds in a value which never exceeds 360 degrees. While the majority of celestial source catalogues have errors of measurement which are at least a bit larger than this, there isn't much margin for error. Mostly, therefore, people choose to store celestial coordinates in double precision values. These use 64-bits for storage, and have a 56-bit mantissa. The precision is thus 1 in 2
56 or at most 1.7 × 10
-11 arc-seconds in a value of 360°. As far as I know even radio-astronomers doing very long baseline interferometry never need more precision than that.
The Cone Search
What is the cone-search? It is a search for any objects in a circular patch of sky specified by a position (RA, DEC) and the radius of a circle around it. If you stop thinking of the celestial sphere and start thinking of infinite space, what you are searching in is a cone with its apex at the eye (or telescope) and extending to infinity - or the edge of the universe whichever comes first.
What you need to find, in a cone-search, is the distance between any given object in the sky, and the specified centre of the cone, and select that object if the distance is less than the specified radius. The distance needs to be the great-circle distance. On the celestial equator that is the same as the cartesian distance, but the scales get increasingly distorted as you get nearer to the poles. Right near a pole, two points can still be within a degree of each other even of their RAs differ by 180°. There are well-known formulae for solving spherical triangles, which you can find in textbooks, but they are not the ones to use. The simplest formula for distance, unfortunately, don't work at all well when the angles are small, as they involve taking the difference of two nearly equal quantities.
A good reference to spherical trigonometry of the conventional sort is
http://www.hps.cam.ac.uk/starry/sphertrig.html but don't use these formulae blindly. Since the cone in an astronomical search often is only a few arc-minutes or even arc-seconds across, one needs a better formula. The solution is to use haversines.
The haversine is the square of the sine of half the angle. In Fortran with the angle in radians, it can be computed like this:
double precision function hav(theta) ! haversign
double precision, intent(in) :: theta ! angle in radians
hav = (sin(0.5d0*theta))**2
end function hav
And similarly an inverse haversine like this:
double precision function ahav(x) ! inverse haversign, result in radians.
double precision, intent(in) :: x
ahav = 2.0d0 * asin(sqrt(x))
end function ahav
Then the great-circle distance function is also just a one-liner:
double precision function gcdist(ra1, dec1, ra2, dec2)
double precision, intent(in) :: ra1, dec1, ra2, dec2 ! coordinates in radians, result in radians
!
gcdist = ahav( hav(dec1-dec2) + cos(dec1)*cos(dec2)*hav(ra1-ra2) )
end function gcdist
Don't forget that the built-in trig functions use radians not degrees in all languages that I know about, so that you need to multiply by pi/180 to convert degrees to radians. SQL has functions called DEGREES and RADIANS built-in, but not many other languages do.
In mathematics as in life you can't usually get something for nothing and there is indeed a drawback to the haversine foumula: it produces inaccurate results when the great-circle distances gets close to 180°s; - but that's actually a case which rarely concerns us in practice, so it can probably be ignored.
There is more on the haversine formula on the web,
http://mathforum.org/library/drmath/view/51879.html is a good place to start.
Note that these distance functions work fine even when the points of interest are very close to the poles, or when one point is near 360° RA and the other is just the other size of the wrap-around point close to 0° RA because, of course, the arguments to the forward trig functins like SIN and COS do not mind whether or not the angle is in the normal range or not (within reason, they lose accuracy if given huge angles).
Indexing for the Cone-search
Typically one has a large collection of rows in some table each of which needs to be examined to see whether the (RA,DEC) of that row correspond to a point within the specified radius of the centre of the search. For large tables, this means lots of I/O and lots of trig. In order to speed things up it makes sense to narrow down the search to the set of records which are somehow near the region of interest. Since one is searching in two-dimensional space, it makes sense to use a spatial index, but not all DBMS are equipped with them, and using them is not always trivial. One-dimensional indexing is easier, and brings a considerable improvement in speed.
Since many astronomical tables are static, the simplest solution is to sort the rows in order of one coordinate of interest, either RA or DEC. Then one can use the binary search (sometimes called binary chop) algorithm to find the rows of interest: first find which half of the range contains the value of interest, then which quarter, then which eighth, etc. For a table of N rows, the number of stages needed is around log
2N. For the largest tables in use (such as USNO-B) with 10
9 rows, this means around 30 stages. You could make sorting more efficient by arranging to store a separate index containing the value in every 1000th row, and then searching through these to find the part of the table of interest, but this starts to look like a primitive B-tree, and you might as well go the whole way.
As far as I know, the only DBMS which rely on sorting are those written by astronomers. I think there are two reasons for this, you can guess one of them, the other one is that B-trees are simply more efficient. Typically a B-tree (where the B does
not stand for Binary) has a fan-out factor of something like 100. The number of levels in a B-tree indexing N rows is then around log
100N, which for N=10
9 amounts to around 5 levels. This makes it, typically, 6 times faster than a sorted table. Of course the upper levels of an index can be sensibly cached, so the actual ratio could be larger.
So my advice is: if your DBMS supports B-trees use them in preference to sorting. If it doesn't support B-trees, then consider switching to one which does.
1-D indexing/sorting: use RA or DEC?
This section covers the question of whether it is better, if you only use a one-dimensional index (or indeed sorting which is inherently one-dimensional) it is better to create the index on RA or DEC.
Let us assume that we are searching for points near (conera, conedec) within a radius of "conerad" and that (for simplicity) all values are in radians.
Index on RA
If the table covers the whole of the sky, then at first sight it seems it gives greater selectivity to use RA since its range is 2×pi radians (360°), whereas that of declination is only pi radians (180°) and there is likely to be a more uniform coverage in RA space than in declination, where even a uniform distribution of sources in the sky will lead to a cosine-shaped distribution along the declination axis.
There are a couple of problems, however, because of the wrap-around at zero RA, and the scale distortions near the poles.
The range of RA values to search is given by this pseudo-code:
IF (conedec + conerad > pi/2 OR conedec - conerad < -pi/2) THEN
Search the entire range of RA from 0 to 2pi
ELSE
theta = asin(sin(conerad) / cos(conedec))
Search the range of RA from (conera - theta) to (conera + theta)
END IF
There is an additional complication if the value of (conera-theta) is < 0 or (conera+theta) > 2pi i.e. the range encompasses the value of zero. In this case the index will have to be used twice, once for the slice near 0 and once near 2pi radians.
This all makes indexing on RA a bit messy, and for the rare case in which the cone includes a pole, very inefficient indeed.
Index on Declination
This is much simpler as there is no wrap-around problem, and you just have to search from (conedec - conerad) to (conedec + conerad).
Pixel-based methods such as HTM and HEALPix
The next step up from pure one-dimensional indexing is to use a pseudo-two-dimensional method by using a function which maps a position to an integer. There are three well-known ways of pixellating the sky (plus an infinite number of others, I guess).
- Igloo: What is called the Igloo method is the simplest, just draw lines of constant declination and then divide each strip into chunks, more or less like the blocks of an igloo. Unfortunately this does not really solve the problem that near the poles a large number of pixels join at a point, nor the wrap-around problem at 0 RA.
- HTM: The Hierarchical Triangular Mesh was invented by Geoffrey Dutton and later adopted and improved by astronomers especially the SDSS team at Johns Hopkins University. It divides the sky into triangular pixels, or trixels.
- HEALPix: This divides the sky into roughtly square pixels, and was invented by Kris Górsky and colleagues at ESO.
A fuller discussion of these pixel-based methods can be seen at
SkyIndexing and
DataCentreSpatialIndexing. Both the HTM and HEALPix methods are about equally good when you need to translate a point in the sky to an integer. It happens that the HTM algorithm is recursive and takes a lot longer to compute than HEALPix, and the complexity makes it much harder to implement in SQL, but otherwise they have equivalent functionality. In a cone search, however, you have to take into account the possibility that the cone will span more than one pixel (or trixel). The HTM library includes functions to find the HTM codes of pixels adjacent to that just computed, whereas with HEALPix I found the best thing to find all pixel-codes from an area was simply to superimpose a grid of celestial positions with a spacing just smaller thant he pixel-size over that patch of sky, and then find the HEALPix codes at each point. Of course there are some duplicates, but these are easy to weed out. When finding the pixel-codes corresponding to a small area of sky, therefore, there may not be much to choose between HTM and HEALPix.
When using any pixel-code method, you need to allow for the fact that the cone may translate into a
set of pixel-codes, and then use the B-tree, created on the pixel-code column, multiple times to find all rows in the range. These will, of course, include some rows not quite inside the desired cone on the sky, so the great-circle distance of each point still needs to be computed. The initial selection does the donkey work, however, so the amount of trig to be done is relatively tiny. Both HTM and HEALPix methods can therefore be quite efficient. Both of them cover the sky fairly uniformly, and avoid problems with wrap-around and polar singularities.
I may write more if I get time, e.g. on how to do spatial joins
--
ClivePage - 12 Aug 2004