DBMS Evaluations

This page will be updated to record the results of our evalutation exercise as they become available.

What do we need DBMS for?

There are around four fairly distinct areas of work associated with AstroGrid where DBMS technology would seem to be useful:

  • Pipeline processing: Here the main need for a DBMS is to track the progress of datasets through a collection of processing stages. The requirements on a DBMS in this area are rather different from those below, specifically:
    • Handling metadata rather than actual scientific data
    • Must have high reliability and availability
    • Must support transactions with rollback (because there may be concurrent writes to the same table)
Because of these differences, and because the VISTA project is now managed independently, the requirements of a DBMS for pipeline processing have not been considered further in this evaluation exercise.

  • Data archiving: Existing data archives have adopted a wide range of DBMS packages, so it is clear that the basic requirements can be easily satisfied by just about any package. But data archives of the future will need to be more capable, and the more demanding aspects are likely to be:
    • Storage of huge volumes of data.
    • Handling complex data strucures - object-oriented and object-relational DBMS have looked attractive here, but OODBMS have not gained a significant following in scientific databases or elsewhere.
    • XML interfaces: so that SOAP and WSDL interfaces can be easily added.
    • Many data files are getting close to the 2 GB limit of a 32-bit file system, so operation on 64-bit file systems will soon be important.

  • Data mining: This term means different things to different groups of people, and at the lower end of the scale are a range of operations which can be carried out over the network using the more capable of the existing data archives. New data archives will be planned with data mining in mind, so it makes sense to choose a DBMS which can handle both aspects. But certain very demanding investigations, particularly those which require very large tabular datasets to be scanned sequentially, may overwhelm existing data archives. These may also be difficult to carry out with data accessed over a network (because of bandwidth and latency limitaitons). AstroGrid proposes to support such operations by providing dedicated data warehouse facilities, which have both large fast disc stores, and farms or clusters of fast processors. The scalability of DBMS to such an environment was therefore also of interest.

  • AstroGrid data management: the management of these data warehouses, and concepts like MySpace, will need some DBMS techniques, but these are likely to be fairly modest in scale, and not especially demanding.

Which DBMS

We were able to carry out a limited evaluation of the two leading freeware DBMS: PostgreSQL and MySQL, and two of the leading commercial DBMS: Oracle and Microsoft SQL Server. Originally we hoped to include IBM DB2 and Sybase, but have not yet had time to do this. Of these all but Mysql claim to be object-relational, in that they have some facilities for handling user-defined data structures.

What to Test

The basic criteria for the evalutation were listed in this document which has been open for comments for some time. As we expected, some of requirements were fully satisfied by all DBMS packages (e.g. provision of trigonometric functions), while other differences which we had not anticipated came to light during testing. For this reason the table below, which gives basic information about the four DBMS selected for evaluation, does not exactly correspond to our original list of criteria.

Disclaimer: the information on these pages, including the tables below, are the results of a preliminary and incomplete evaluation of the database management packages for handling scientific data, a role for which they were not expressly designed. Some of the information may well be incorrect or paint an incomplete picture. Please let the authors know of any errors that you find.

  PostgreSQL MySQL Oracle SQL Server DB2
Supplier www.postgres.org www.mysql.com Oracle Corporation Microsoft Corporation IBM
Licence free (GPL) free (GPL) or commercial commercial commercial commercial
Market position widely regarded as most powerful freeware DBMS most popular freeware DBMS DBMS market leader DBMS from world's largest software company Leading software/hardware company
Host platforms Linux, many Unix, Windows Linux, many Unix, Windows many Unix, Linux, PC/Windows, IBM mainframes Windows only linux, windows, other
Test platform          
Version tested v7.2.1 3.23.51 Oracle 9i v9.2.0.1.0 SQLServer 2000 Enterprise Edition v 7.2
Test hardware Pentium 450 MHz Pentium 450 MHz Compaq XP900 workstation Pentium III 1.2GHz Athlon 1.7GHz
Test operating system Linux (Redhat 7.2) Linux (Redhat 7.2) Tru64 Unix v5.1 Windows 2000 Advanced Server Linux Redhat 7.3
Test location Leicester Leicester Edinburgh Edinburgh Leicester
Time to download & install and start using ~30 mins ~10 mins ~2 weeks ~15 mins few hours
Documentation on-line tutorial, user's, admin, programming guides On-line reference Manual, tutorials About 100 manuals (all on-line) Online manuals Many online manuals
Query language          
SQL92 level approx entry level less than entry level entry level entry level entry level
Transactions with rollback yes only with InnoDB yes yes yes
Programming language? PL/PGSQL none PL/SQL Transact-SQL PL/SQL
Statistical functions AVG, STTDEV, VARIANCE AVG, STDDEV AVG, STDDEV, VARIANCE, and many cor correlation and regression AVG, STDEV, STDEVP, VAR, VARP AVG, STDDEV, VAR
Outer joins left, right, full left, right left, right, full left, right, full left, right, full
LIMIT/TOP n, OFFSET LIMIT, OFFSET LIMIT equivalent functionality TOP equivalent functionality
User-defined functions in PL/PGSQL or C in C only PL/SQL or Java in Transact-SQL, C, C++, VB,... C, Java, Perl, SQL, ...
Data types          
Integer 2,4,8 bytes 1,2,4,8 bytes NUMBER(38) 1,2,4,8 bytes 2,4,8 bytes
Floating point 4,8 bytes 4,8 bytes NUMBER(p,s) 4,8 bytes 4, 8 bytes
Boolean yes yes no yes no
date/time date,time,timestamp date,time,datetime,timestamp date, timestamp datetime, smalldatetime, timestamp, interval date, time, timestamp
string char,varchar char,varchar char,varchar,long char,varchar,text, binary, varbinary char, varchar, long varchar
spatial types? rectangular box - with spatial option no as an option (geographical)
BLOBs yes yes yes yes yes
user-defined? yes no yes yes yes
Indexing B-tree, R-tree on box B-tree B-tree (R-tree with spatial data option) B-tree B-tree
Software Interfaces          
Language APIs C, C++, Java, ODPC, Perl, PHP, Python, TCL C, C++, Eiffel, ODBC, Java/JDBC, Perl, PHP, Python, TCL C, C++, COBOL, Fortran, Java, PL/I C++, J++, VB, ODBC, ADO, OLE DB, Jscript, Vbscript C, C++, Java, Perl
Data import/export          
Binary import ? badly documented no ? yes for image (jpeg, tiff) only jpeg, tiff only
Binary export yes, intractable format no no yes no
HTML export yes yes yes yes yes
XML export only via HTML only via HTML yes yes yes
Scalability          
Mapping to host file system complex: table maps to many files simple: database is a directory, table is a file complex simple: database is single file, log another, also complex: db maps to file groups for DB and logs complex
Support for 64-bit cpus Alpha, Ultrasparc, IBM, SGI, Intel chips Alpha, Ultrasparc yes yes, in beta - for Itanium chips yes
Support for SMP systems performance gain only for multiple concurrent queries none obvious yes automatically sets degree of parallelisation: can perform parallel queries, indexing,etc Yes, and parallelization strategies could even work in SP mode (IBM says)
Max table size 64 TB (less for some O/S) 4 GB (more on 64-bit file systems) no obvious limit 1 Exabyte 512 GB
Max single field 1 GB ? 2 GB 2 GB ?
Max number of rows "unlimited" no obvious limit no obvious limit limited by storage 4 x 109
Max number of columns 1600 ? 1000 1024 1012
Max indexed columns unknown 32 (more by recompiling) no limit 250 per table 32767 or storage limited

Performance Tests

A number of performance tests were carried out. The test datasets we used were samples of the following all-sky catalogs covering about 400 square degrees in a box from 0 to 4 hours RA and 30 to 37.5 degrees declination, which amounted to a sample of about 1% of the sky.

  • GSC - Hubble Guide Star Catalog v1.2: 275,154 rows, 11 columns including PCODE.
  • USNO-A2 - US Naval Observatory Catalog: 3,476,948 rows, 6 columns including PCODE.

These were converted from the original FITS tables or binary format into text files, as most DBMS are unable to ingest data in any more efficient format. Note: the additional PCODE column was computed during the conversion to plain ASCII, as were the addition rows required by the PCODE algorithm. The times reported in these tests are the elapsed times, not just cpu times, since I/O overhead is often a significant factor, but care was taken that the machine was virtually idle during the testing, apart from the DBMS processes.

Catalogue (fuzzy) Joins: R-trees versus PCODE

The cross-matching of celestial sources in two (or more) catalogues of the sky is an important operation, being the basis of many astronomical data mining investigations. In principle it merely requires a JOIN between tables, but this has to be based on approximate match of coordinates within combined error limits, and using a great circle distance, since the cartesian approximation fails at high declinations.

Until recently it was generally assumed that to perform a fuzzy join efficiently it was necessary to have a spatial index available, but spatial indexing is supported by only a few DBMS. Recently, however, I devised an alternative algorithm which I called the PCODE method, being based on a pixellation of the sky. This is described more fully in my SkyIndexing document. The PCODE method requires an additional PCODE column in the tables, and some extra rows, but the DBMS only has to carry out an integer equi-join, followed by some simple post-processing to remove a small number of spurious matches and duplicated results, so it is within the capabilities of any common-or-garden DBMS.

I thought it would be interesting to compare these methods on a suitable DBMS. Postgres has a built-in spatial data type with R-tree indexing, while Oracle has a Spatial Data Option with similar or perhaps more advanced facilities. It turned out to be much easier to get PostgreSQL working than Oracle, and the results are shown in the next table.

  R-tree PCODE with B-tree
Time to load USNO 275 s 243 s
Time to create index 6408 s 233 s
Size of resulting table 496 MB 293 MB
Size of index 248 MB 62 MB
Time to compute fuzzy join 347 s 143 s

The superiority of the PCODE join seems clear. In particular the time to create the R-tree index is rather high: if it scales linearly (and it may be worse than that) the time to create an R-tree of the whole USNO-A2 catalogue rather than under 1% of it would be just under 14 days. This is consistent with results reported at an ADASS conference.

PostgreSQL versus MySQL versus cFITSIO

The following tests compare the two freeware DBMS using the same computer (a Pentium-II with 450 MHz processor and internal IDE discs). In order to assess the overheads (or otherwise) of using a DBMS, I decided also to compare these two with compiled code using the cFITSIO library to access the same datasets stored in a FITS binary table. These tests were limited to (a) a simple join on PCODE and (b) a sequential scan for mean and standard deviation. For the DBMS two joins were computed:

  1. A basic equi-join on PCODE columns
  2. The equi-join followed by the necessary great-circle distance filter to weed out the sources with identical PCODE values but not actually within the required distance,

Various other tests, including a LEFT OUTER join, were also computed, but the times were generally comparable, and not worth showing separately. Finally a sequential scan operation was carried out, finding the mean and standard-deviation of a single column, and then (for the DBMS) selecting points more than 3 sigma above the mean (there were only 18 points out of 3.5M like this).

  PostgreSQL MySQL cFITSIO DB2
Time to load USNO sample 243 s 80 s 143 s 30 s
Time to create B-tree index 233 s 72 s 36 s 50 s
Size of resulting table 293 MB 129 MB 111 MB 184 MB
Size of index file 62 MB 32 MB 30 MB ~30 MB
Equijoin on PCODE 143 s 121 s 102 s 75 s
Find mean,stddev of usno (3.5M rows) 54 s 7.3 s 2.0 s 9 s
Select from usno for bmag>mean+3sigma 126s 13.8 s - 15 s
select count(*) from usno; 41 s 0 s - 5 s

It is clear that there is a significant DBMS overhead on most operations, and especially on those involving a sequential scan. But it should be noted that the B-tree algorithm used here is relatively basic, and no doubt could be optimised to perform better. The data loading of the FITS file was also slow: again no attempt was made to optimise this at all.

Effect of Processor Speed

Most of the evaluation work done at Leicester used an elderly PC with a Pentium-II running at 450 MHz. I ran some tests of Mysql using a 1200 MHz Pentium, otherwise fairly similar. The received wisdom is that most database operations are I/O limited, but the figures below suggest otherwise. All times are elapsed times in seconds.

  450 MHz 1200 MHz ratio
Import USNO 81.1 49.7 1.6
Import GSC 8.9 3.3 2.7
Create index on USNO 64.9 65.1 1.0
Equijoin on PCODE 126.5 21.1 6.0
Find mean and stddev of BMAG 7.0 1.8 3.9
Select bmag > 23.2 6.8 1.5 4.5

I do not understand all these figures, particularly the fact that index creation did not seem faster, but in the other cases the operations are clearly speeded up by having a faster cpu. The fact that the FITSIO operations are so much faster than using Mysql also suggests that disc I/O is not a limiting factor in most operations. This I find somewhat surprising.

Column-oriented Tables versus Row-oriented Tables

I am grateful to Prof Peter Buneman of Edinburgh University for rekindling my interest in this topic. He pointed out that where it is necessary to do a sequential scan of a large table, but only one or two columns are of interest out of a large number, it should be much more efficient to store the data for each column contiguously, rather than each row. I was able to point out to him that astronomers had already thought of this: the ESO/MIDAS Table System stores its data column-wise, and the STSDAS table file system (an add-on to IRAF produced at STScI) the creator of a file has the option of using row or column orientation. It then occurred to me that the FITS binary table format might just be perverted to store data column-wise: the necessary change is to have a table with just one row, but to have each column defined as a vector of NR elements, where there would normally be NR rows. I consulted the author of the FITSIO library, Willian Pence of GSFC, who told me that there was nothing in the library which would prevent such access being handled efficiently. Fortunately the interface routines are flexible enough to make this perfectly feasible.

I therefore wrote a couple of short programs to first convert the usno.fits file into column-major order, and then rewrite my fstats program to handle this format. Neither task took long. The results are shown here for the computation of mean and standard deviation on a single column (bmag) of the usno sample table (3.5 million rows), using the 450 MHz PC:

  FITSIO row-orientated FITSIO column-oriented
Find mean and std.devn of 3.5M rows 1.95 s 0.66 s

The speed-up factor of around 3 is perhaps not all that impressive, but this is a table of only seven columns, and perhaps the overheads of starting the program and opening the FITS table are significant. Using a larger table would throw light on this. But it is notable that an operation which in PostgreSQL takes 54 seconds, can be speeded up by a factor of 80 by using FITSIO and column-orientation. The final figure, 0.66 seconds, to retrieve 3.5 million values of 4 bytes each equates to a bandwidth of 21 MB/s. I suspect that this is near the bandwidth limit of the internal IDE disc in this PC, nearly 3 years old. It is discouraging that the best DBMS that we have tested cannot even manage even a tenth of this speed.

Detailed Notes

More detailed notes on the various evaluations are available in separate documents:

Preliminary Conclusions

For data archiving, it is obvious from the number of systems using all these DBMS and many more, that reasonable facilities are provided by any modern DBMS. It is not clear that the ability of the object-relational DBMS is of significant benefit. All these DBMS will store BLOBs (binary large objects) and such facilities have been available for many years, but hardly any astronomical archives have chosen to use them, and all those that I know also duplicate these objects in external storage. The ability to define one's own data type in principle gives superior facilities: the ability to index on the properties of such an object. In practice, however, the effort involved (which requires the creating of suitable indexing functions, and interfacing with the query optimiser to take advantage of such indices) seems disproportionate to the gain.

Mysql seems generally faster than Postgres, and has an appealing simplicity. We cannot directly compare the speeds of these two free DBMS with the commercial ones, because we did not use the same hardware, but there does not appear to be a big difference in the query performance. The commercial DBMS vendors have recently put considerable effort into getting their products to take advantage of PC clusters or farms; it may be possible to get the free DBMS to scale as well, but this is not built-in, it will probably require a lot of development effort.

As far as data mining is concerned these systems all have quite severe limitations, although there is no single "show stopper", they look far from ideal. It is hard to separate the DBMS limitations from those of SQL, but since SQL is essentially the only interface, perhaps such a distinction is unnecessary.

  • Non-procedural syntax: The syntax of SQL is result-orientated, which most scientists find counter-intuitive, since they are used to breaking a complex operation down in to simpler stages. With SQL you don't do that, indeed you often have to work backwards from the desired result.

  • Multi-line commands in SQL are the norm, but it is hard to get them right; error messages are cryptic and generally unhelpful.

  • Poor standards-compliance: Although there is an international standard for SQL92, there is hardly a single DBMS on the market which even implements the whole of its Entry Level (the lowest of four levels). And since NIST gave up checking for compliance some years ago, the situation is likely to get worse. This particularly affects the VO as users are likely to interact with a number of different DBMS, all speaking different dialects. Although the basic facilities can be wrapped in GUI functions, the power user may want the full power of SQL. Many commands which will often be required, such as buld loading of data, exporting to a text or HTML file, performing outer joins, and selecting into another table, all involve different SQL on different DBMS. Even something as simple as squaring a value can be done, variously, as POW(x,2), or x^2, or SQUARE(x). I think the POW function is the standard form, but it's hard to tell.

  • No metadata: nothing besides the column name and its data type is handled by the DBMS. It is possible to devise complex schema in which separate tables hold metadata, but if you use SQL to select or project new tables, these metadata will be left behind.

  • Floating-point numbers are displayed poorly, with arbitrary switching from fixed-point to exponential notation, and arbitrary decimal places. Most DBMS have functions to format such numbers in a controlled way, to get aligned decimal points in columns, but many of these are non-standard. None of the DBMS have functions built-in which can input or output angles in sexagesimal format; in principle one could use the formatting functions provided to build suitable functions, but the languages in which external functions can be written differ surprisingly from one DBMS to another.

  • Nulls are supported fully by all the DBMS examined so far, but there appears to be no standard way of representing them in imported or exported text files.

  • Binary import/export is an almost essential facility since many sources of data will be large, and probably available in FITS binary format, but is almost never supported. Conversion to text is liable to lose precision, is error-prone, slow, and uses unnecessary volumes of disc space.

  • User-written functions can be provided in all systems examined so far, but the languages accepted vary, and none of the mechanisms is very easy to use.

  • Resource usage in terms of the disc space likely to be occupied by a table after ingestion is very hard to estimate on most systems. Here Mysql is an exception, and its overheads (with the Myisam backend) are low.

  • Network access neither of the free DBMS have any mechanism for accessing tabular information present on other networked computers. None of them supports any truly distributed database. In contrast my fstats program was instantly able to process a file present on a remote system, merely by giving it a URL instead of the filename containing the FITS table. When reading a file on the LAN the speed was almost as fast as when reading a local file. FTP and HTTP access are built into the FITSIO library, of course, so this comes free.

  • Graphical operations are conspicuous by there absence: a data exploration system really ought to allow the user to plot histograms, or plot one column against another, and perform similar operations. Even producing a simple histogram as a table involves some messy SQL.

I am not sure what the solution to the data mining and data exploration problem is, but I think that an SQL-speaking DBMS forms only a small part of it.

-- Clive Page, and other members of the AstroGrid database team, - 19 Aug 2002


I just came across an interesting comparison of Mysql, Postgres, and Oracle on the CERN web-site - somewhat out-of-date, but still useful in parts. See http://hep-proj-database.web.cern.ch/hep-proj-database/db_compar.htm Note added 2003 July 30: unfortunately this link seems now to be broken and I cannot find the document even on the wayback machine. It contained some strong opinions on these products, and one can only speculate on why it has been withdrawn.

Another interesting document from CERN was presented at GGF5 on their Spitfire Project, see this PDF file

Another interesting comparison of SQL support in DBMS comes from Troels Arvin and his links.

Contributors to this page include Bob Mann and Clive Davenhall of the University of Edinburgh, and Patricio Ortiz and Clive Page of the University of Leicester.

-- Last updated by Clive Page, 15 Dec 2003.

Topic revision: r16 - 2003-12-15 - 11:58:45 - ClivePage
 
AstroGrid Service Click here for the
AstroGrid Service Web
This is the AstroGrid
Development Wiki

This site is powered by the TWiki collaboration platformCopyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback