use astrogrid
create table gsc(
pcode int not null, -- HEALPix index
gscid int not null, -- GSC map number (4 digits) // GSC_ID (5 digits)
gscra float not null, -- RA (radians)
gscdec float not null, -- Dec (radians)
pos_err real not null, -- posn error
(assumed 1 sigma, PCODE generated using 3 sigma)
mag real not null, -- magnitude
mag_err real not null, -- magnitude error
mag_band int not null,
class int not null,
plate_id varchar(4) not null, -- 4 chars
multiple varchar(1) not null -- should be logical
)
create table usno(
uid int not null, -- arbitrary sequential number
ura float not null, -- RA (radians)
udec float not null, -- Dec (radians)
rmag real null, -- there are some nulls here
bmag real null, -- ditto
upcode int not null -- HEALPix code
)
Loading the data:The easiest way to load ASCII data into SQL Server is to use the
DTS Import/Export Wizard GUI, which is launched from the Enterprise Manager GUI: the Data Transformation Services (DTS) packages provides "a set of graphical tools and programmable objects that lets you extract, transform, and consolidate data from disparate sources into single or multiple destinations".
The DTS Import/Export Wizard leads the user through a set of steps specifying the input data format and the database table location, and offers the user tha ability to save the parameters thus set into a package that that can be run again later. CGP's USNO input catalogue contained a number of "\N" entries denoting null values. SQL Server
didn't like those, so I had to replace them by a default value before I could get the USNO dataset to load properly. My one previous experience of loading into SQL Server data that contains nulls suggests that it is far easier to do that using a CSV (comma-separated variable) format input datafile -- SQL Server happily reads in a row containing entries "...1,2,,3..." and understands that there's a null value being set between the 2 and the 3.
There are other ways to import data from files. From the Query Analyser GUI, one can run a BULK INSERT command and there is also the bcp command prompt utility which does the same thing: I think that these two options would allow the import of data in binary format (not possible through the DTS Import/Export Wizard), but I'm not 100% sure of that.
Indexing the tables:create index gsc_pcode_index on gsc(pcode) create index usno_pcode_index on usno(upcode)Working out how much extra space is taken up by the index is a slightly non-trivial, due to the way that SQL Server reserves disk space for itself. The user can specify how this can be done, but the default is to leave it up to SQL Server to grow the database as and when required. It seems that, when using this option, disk space reserved for a particular database is not immediately freed once the data filling it is deleted. For example, I was playing around, loading different-sized subsets of the USNO catalogue in SQL Server, to see how the load and index time scaled, and found that the size of the database file didn't vary with the number of rows. It took me a little while to work out that SQL Server had expanded the disk space reserved for the database as I was loading the full USNO catalogue, but hadn't released it immediately when I deleted that table and tried loading smaller subsets. Simple join on PCODE values:
select * from gsc, usno where pcode=upcodetakes about 30 seconds to run (finding 358455 matches), but running a second query, which also writes the results into a new table
select * into pcodetable from gsc, usno where pcode=upcodestraight afterwards only takes 14 seconds, presumably because much of the data is still cached. Great-circle join:
select distinct *, round(degrees(7200*asin(sqrt(square(sin(0.5*(u.udec-g.gscdec))) + cos(u.udec)*cos(g.gscdec)*square(sin(0.5*(u.ura-g.gscra)))))),2) as sep into myjoin from gsc as g, usno as u where g.pcode=u.upcode and 2.0*asin(sqrt(square(sin(0.5*(u.udec-g.gscdec))) + cos(u.udec)*cos(g.gscdec)* square(sin(0.5*(u.ura-g.gscra))))) < 5e-6+pos_errbut that was mainly due to a combination of lack of familiarity with SQL and typos. That ran in 25 seconds, writing 241460 rows into the myjoin table. Statistical queries:
select avg(bmag) from usno where bmag < 99.0for the average of the USNO B mags, and
select stdev(bmag) from usno where bmag < 99.0for their standard deviation. Together, these two queries took 15s to run. Defining user functions:
IF EXISTS (SELECT name
FROM sysobjects
WHERE name = N'fGreatCircleDist' )
DROP FUNCTION fGreatCircleDist
CREATE FUNCTION fGreatCircleDist (@ra1 float, @dec1 float, @ra2 float, @dec2 float)
-------------------------------------------------------------
--/H Returns float holding great circle distance [in arcsec]
--/H between two points (ra1,dec1) & (ra2,dec2) [ras,decs in radians]
-------------------------------------------------------------
RETURNS float
AS
begin
return 2.0*asin(sqrt(square(sin(0.5*(@dec1-@dec2))) +
cos(@dec1)*cos(@dec2)* square(sin(0.5*(@ra1-@ra2)))))
END
which tests if there already exists a function called fGreatCircleDist, and, if not, defines it in the appropriate manner. This I tested with the followingdeclare @ra1 float declare @dec1 float declare @ra2 float declare @dec2 float set @ra1=1.0 set @dec1=1.0 set @ra2=2.0 set @dec2=2.0 print dbo.fGreatCircleDist(@ra1,@dec1,@ra2,@dec2)and checked the results against those from the gcirc procedure from the IDL astrolib library. Conclusions:
![]() |
Click here for the AstroGrid Service Web |
This is the AstroGrid Development Wiki |
|