Cross Match Algorithms (updated)

I wanted to put on record the fact that another algorithm for performing a spatial join (or catalogue cross-match) has recently been puhlished. The algorithm was invented by the Australian VO team (Dave Abel, Drew Devereux, Robert Power, and Peter Lamb). The algorithm is described in a technical report entitled An O(NlogM) Algorithm for Catalogue Matching. Details of the performance are given in another technical report, Benchmarking Catalogue Cross Matching, by Robert Power and Drew Devereux.

Several of the catalogues they used are the same as those in our own evaluations, so it is interesting to compare the resulting matching speeds, even though the hardware is rather different. When matching large catalogues, their plane-sweep join is around 10 times faster than the join based on R-trees using Postgres, but when joining a small table with a large one, the R-tree join can be more than 10 times faster.

This came as a slight surprise, but a little thought shows it actually quite explicable. The R-tree join works by reading the first table sequantially (the smaller one if you arrange things sensibly) and then using the R-tree index to look up the appropriate rows of the second (larger) table. If the first table is quite small, this is efficient. If the two tables are of comparable size, then the whole of one is read sequentially, while practically the whole of the other is read but by indexed look-up, which is bound to be more time-consuming.

The plane-sweep join reads the whole of both tables from disc: even if one table is small it still means that the whole of the larger table has to be read from disc, whereas the R-tree method will only read relatively few records from this table.

The plane-sweep join also benefits from the fact that it is implemented not using a general-purpose relational database management system, but using custom-build C++ code. It is hard to estimate how much benefit this might bring, but I'd guess a speed-up factor of a few (maybe 3 to 10).

One interesting point: the plane-sweep algorithm reminded me of one that I had seen before, Scalable Sweeping-Based Spatial Join by Aage et. al., which was published in 1998 in the proceedings of the VLDB Conference. When I drew his attention to it, Dave Abel said that he had not come across this paper before; so it is obviously yet another case in which much the same idea has arisen independently more than once. Fortunately the Australian-VO team have been able to demonstrate it on a number of actual astronomical datasets, and the results show it to be of considerable value when cross-matching pairs of large catalogues.

-- ClivePage - 16 Sep 2004 (restored 18 Nov 2004)

Topic revision: r1 - 2004-11-19 - 14:08:00 - ClivePage
 
AstroGrid Service Click here for the
AstroGrid Service Web
This is the AstroGrid
Development Wiki

This site is powered by the TWiki collaboration platformCopyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback