Pixel Code Sizes
One of the original attractions of pixel-code methods was, I think, that every source in the sky could have its position uniquely encoded by an integer, and that
- This allows a source to be found by a simple integer look-up (using a B-tree index) in a catalogue stored as a database table, and
- Catalogues could be cross-matched simply by doing an equi-join on these columns of integer pixel-codes.
This is, unfortunately, not feasible, as the following considerations show.
Error-box sizes
Although most sources detected in the sky are unresolvable points, the positions are always uncertain to some extent. The factors causing this are:
- The coordinates are stored as floating-point numbers, so testing them for exact equality is unwise
- Some sources have a measured extent, for example galaxies, supernova remnants, planetary nebulae, clusters of galaxies, star clusters...
- Random errors of measurement: from the IR band upwards, this typically results from a finite number of photons being registered on an electronic detector (or earlier on a photographic emulsion).
- Systematic errors in converting the pixel coordinates on the focal plane to celestial coordinates, for example because of uncertainties in the plate-scale, or in the lack of nearby calibration objects.
- Proper motions: some nearby sources can be seen to move over periods of years, so that any measured position becomes out-of-date: some catalogues estimate proper motions from detections at widely different epochs, but these values are themselves uncertain, so that projections to any given epoch are uncertain.
Although some recent optical and IR catalogues have random errors well below one arc-second, the combined influence of the other factors means that it is generally better to use an error-box of at least one arc-second in radius when attempting identifications or cross-matching two catalogues.
Pixel-code Algorithms
All of the pixel-code methods allow the user to choose the level of tessellation of the sphere required. The choice of pixel size for a pixellation of the sky is somewhat arbitrary: choose pixels too large and a selection based on a single pixel number gets you too large a chunk of sky; choose pixels too small and a region of interest turns into too many distinct pixel numbers. The natural choice of just under 2
31 pixels in the sky, so the resulting integers can be stored in an unsigned integer number generates pixels or trixels around 25 arcseconds across. This is a quite convenient size if cone searches typically look for cones of up to a few times that size.
HEALPix
The resolution depends on a parameter called Nside, which has to be of the form 2
k. The number of pixels on the sky for a given value of Nside is N given by
Number of pixels on the sky: N = 12 × (Nside)²
The area of each pixel is thus: (4 π / N) steradians
Typical edge-length of the approximately square pixels is: s = (4 π/N)
½ radians.
If the error-circle has a radius of
r radians, then the probability of a randomly placed error-circle crossing a pixel boundary corresponds to the fraction of the area of a pixel which is occupied by an zone of width
r around the edge. This fraction,
f, is given by
f = (s² - (s-2r)²)/s² = 4r(s-r)/s² ≈ 4r/s (when r « s)
The maximum value of Nside for N to be storable in a (signed) 32-bit integer is Nside=8192, which results in N = 805,306,368 pixels.
The corresponding pixels are about 1.25e-4 radians or 25.8 arcseconds across.
For r = 1 arcsecond, this gives f = 15.5%.
HTM
The parameter one can choose is D, and the number of triangular pixels (sometimes called trixels) is N, given by N = 8×4
D. The largest value of D to give N within the 32-bit signed integer limit is D = 14, giving N = 2,147,483,648.
The pixels vary in size and shape somewhat, but the mean area is clearly given by A = (4 π / N). The trixels approximate to equilateral triangles of mean side
s, and the area of an equilateral triangle of side s is given by (3
½ s² / 4).
Thus s = (16 π / (N √3))
½ radians.
By analogy, the fraction of the area of a trixel occupied by a zone within
r of the edge is given approximately by:
f ≈ 3r/s (when r « s)
Thus for N = 2,147,483,648, and r = 1 arcsecond, the mean trixel edge is around s = 1.2e-4 radians or 24.0 arceconds, and the fraction
f = 12.5%.
Consequences
For pixels or trixels around the size typically used (25 arc-seconds on a side), it is clear that one could not perform an integer equi-join using just pixel-code alone, as it would lose around 12 to 15% of the sources in which an error-circle, typically an arc-second in size, spanned a pixel boundary.
--
ClivePage - 26 Feb 2004
Is it possible or desirable to allocate sources to pixels using double-counting of sources on a pixel/trixel boundary and then weed out the duplicates in query returns?
--
AnitaRichards - 26 Feb 2004
Sorry that I've only just got around to answering that question, which is a good one. It would be feasible to cope with the error region overlapping two (or more) pixels by inserting extra rows, with a unique identification per source, and then using that unique id to remove duplicates afterwards. But I can think of a couple of drawbacks:
- The number of extra rows might be quite substantial, e.g. 12 to 15 percent for the values listed above, which are probably not untypical. This increases the size and I/O overhead of searches and joins.
- Removing duplicates requires the sorting of the results into order by the unique-identifier. This may be explicit (using ORDER BY) or implicit (using DISTINCT) in the SQL. Either way, it is a time-consuming step, which scales more slowly than linearly.
- Actually implementing this is not trivial: when allocating a pixel-code to each source the process has to be capable of generating one or more additional rows, and inserting these into the table. This can't be done trivially within SQL, though systems with procedural add-ons such as the pl/pgsql of Postgres (and the rather similar facilities in Oracle) make it feasible.
--
ClivePage - 23 Mar 2004