DBMS Evaluations
This page will be updated to record the results of our evalutation exercise as they become available.
What do we need DBMS for?
There are around four fairly distinct areas of work associated with
AstroGrid where DBMS technology would seem to be useful:
- Pipeline processing: Here the main need for a DBMS is to track the progress of datasets through a collection of processing stages. The requirements on a DBMS in this area are rather different from those below, specifically:
- Handling metadata rather than actual scientific data
- Must have high reliability and availability
- Must support transactions with rollback (because there may be concurrent writes to the same table)
Because of these differences, and because the VISTA project is now managed independently, the requirements of a DBMS for pipeline processing have not been considered further in this evaluation exercise.
- Data archiving: Existing data archives have adopted a wide range of DBMS packages, so it is clear that the basic requirements can be easily satisfied by just about any package. But data archives of the future will need to be more capable, and the more demanding aspects are likely to be:
- Storage of huge volumes of data.
- Handling complex data strucures - object-oriented and object-relational DBMS have looked attractive here, but OODBMS have not gained a significant following in scientific databases or elsewhere.
- XML interfaces: so that SOAP and WSDL interfaces can be easily added.
- Many data files are getting close to the 2 GB limit of a 32-bit file system, so operation on 64-bit file systems will soon be important.
- Data mining: This term means different things to different groups of people, and at the lower end of the scale are a range of operations which can be carried out over the network using the more capable of the existing data archives. New data archives will be planned with data mining in mind, so it makes sense to choose a DBMS which can handle both aspects. But certain very demanding investigations, particularly those which require very large tabular datasets to be scanned sequentially, may overwhelm existing data archives. These may also be difficult to carry out with data accessed over a network (because of bandwidth and latency limitaitons). AstroGrid proposes to support such operations by providing dedicated data warehouse facilities, which have both large fast disc stores, and farms or clusters of fast processors. The scalability of DBMS to such an environment was therefore also of interest.
- AstroGrid data management: the management of these data warehouses, and concepts like MySpace, will need some DBMS techniques, but these are likely to be fairly modest in scale, and not especially demanding.
Which DBMS
We were able to carry out a limited evaluation of the two leading freeware DBMS:
PostgreSQL and MySQL, and two of the leading commercial DBMS: Oracle and Microsoft SQL Server. Originally we hoped to include IBM DB2 and Sybase, but have not yet had time to do this. Of these all but Mysql claim to be object-relational, in that they have some facilities for handling user-defined data structures.
What to Test
The basic criteria for the evalutation were listed in
this document which has been open for comments for some time.
As we expected, some of requirements were fully satisfied by all DBMS packages (e.g. provision of trigonometric functions), while other differences which we had not anticipated came to light during testing. For this reason the table below, which gives basic information about the four DBMS selected for evaluation, does not exactly correspond to our original list of criteria.
Disclaimer: the information on these pages, including the tables below, are the results of a preliminary and incomplete evaluation of the database management packages for handling scientific data, a role for which they were not expressly designed. Some of the information may well be incorrect or paint an incomplete picture. Please let the authors know of any errors that you find.
| |
PostgreSQL |
MySQL |
Oracle |
SQL Server |
DB2 |
| Supplier |
www.postgres.org |
www.mysql.com |
Oracle Corporation |
Microsoft Corporation |
IBM |
| Licence |
free (GPL) |
free (GPL) or commercial |
commercial |
commercial |
commercial |
| Market position |
widely regarded as most powerful freeware DBMS |
most popular freeware DBMS |
DBMS market leader |
DBMS from world's largest software company |
Leading software/hardware company |
| Host platforms |
Linux, many Unix, Windows |
Linux, many Unix, Windows |
many Unix, Linux, PC/Windows, IBM mainframes |
Windows only |
linux, windows, other |
| Test platform |
|
|
|
|
|
| Version tested |
v7.2.1 |
3.23.51 |
Oracle 9i v9.2.0.1.0 |
SQLServer 2000 Enterprise Edition |
v 7.2 |
| Test hardware |
Pentium 450 MHz |
Pentium 450 MHz |
Compaq XP900 workstation |
Pentium III 1.2GHz |
Athlon 1.7GHz |
| Test operating system |
Linux (Redhat 7.2) |
Linux (Redhat 7.2) |
Tru64 Unix v5.1 |
Windows 2000 Advanced Server |
Linux Redhat 7.3 |
| Test location |
Leicester |
Leicester |
Edinburgh |
Edinburgh |
Leicester |
| Time to download & install and start using |
~30 mins |
~10 mins |
~2 weeks |
~15 mins |
few hours |
| Documentation |
on-line tutorial, user's, admin, programming guides |
On-line reference Manual, tutorials |
About 100 manuals (all on-line) |
Online manuals |
Many online manuals |
| Query language |
|
|
|
|
|
| SQL92 level |
approx entry level |
less than entry level |
entry level |
entry level |
entry level |
| Transactions with rollback |
yes |
only with InnoDB |
yes |
yes |
yes |
| Programming language? |
PL/PGSQL |
none |
PL/SQL |
Transact-SQL |
PL/SQL |
| Statistical functions |
AVG, STTDEV, VARIANCE |
AVG, STDDEV |
AVG, STDDEV, VARIANCE, and many cor correlation and regression |
AVG, STDEV, STDEVP, VAR, VARP |
AVG, STDDEV, VAR |
| Outer joins |
left, right, full |
left, right |
left, right, full |
left, right, full |
left, right, full |
| LIMIT/TOP n, OFFSET |
LIMIT, OFFSET |
LIMIT |
equivalent functionality |
TOP |
equivalent functionality |
| User-defined functions |
in PL/PGSQL or C |
in C only |
PL/SQL or Java |
in Transact-SQL, C, C++, VB,... |
C, Java, Perl, SQL, ... |
| Data types |
|
|
|
|
|
| Integer |
2,4,8 bytes |
1,2,4,8 bytes |
NUMBER(38) |
1,2,4,8 bytes |
2,4,8 bytes |
| Floating point |
4,8 bytes |
4,8 bytes |
NUMBER(p,s) |
4,8 bytes |
4, 8 bytes |
| Boolean |
yes |
yes |
no |
yes |
no |
| date/time |
date,time,timestamp |
date,time,datetime,timestamp |
date, timestamp |
datetime, smalldatetime, timestamp, interval |
date, time, timestamp |
| string |
char,varchar |
char,varchar |
char,varchar,long |
char,varchar,text, binary, varbinary |
char, varchar, long varchar |
| spatial types? |
rectangular box |
- |
with spatial option |
no |
as an option (geographical) |
| BLOBs |
yes |
yes |
yes |
yes |
yes |
| user-defined? |
yes |
no |
yes |
yes |
yes |
| Indexing |
B-tree, R-tree on box |
B-tree |
B-tree (R-tree with spatial data option) |
B-tree |
B-tree |
| Software Interfaces |
|
|
|
|
|
| Language APIs |
C, C++, Java, ODPC, Perl, PHP, Python, TCL |
C, C++, Eiffel, ODBC, Java/JDBC, Perl, PHP, Python, TCL |
C, C++, COBOL, Fortran, Java, PL/I |
C++, J++, VB, ODBC, ADO, OLE DB, Jscript, Vbscript |
C, C++, Java, Perl |
| Data import/export |
|
|
|
|
|
| Binary import |
? badly documented |
no |
? |
yes for image (jpeg, tiff) only |
jpeg, tiff only |
| Binary export |
yes, intractable format |
no |
no |
yes |
no |
| HTML export |
yes |
yes |
yes |
yes |
yes |
| XML export |
only via HTML |
only via HTML |
yes |
yes |
yes |
| Scalability |
|
|
|
|
|
| Mapping to host file system |
complex: table maps to many files |
simple: database is a directory, table is a file |
complex |
simple: database is single file, log another, also complex: db maps to file groups for DB and logs |
complex |
| Support for 64-bit cpus |
Alpha, Ultrasparc, IBM, SGI, Intel chips |
Alpha, Ultrasparc |
yes |
yes, in beta - for Itanium chips |
yes |
| Support for SMP systems |
performance gain only for multiple concurrent queries |
none obvious |
yes |
automatically sets degree of parallelisation: can perform parallel queries, indexing,etc |
Yes, and parallelization strategies could even work in SP mode (IBM says) |
| Max table size |
64 TB (less for some O/S) |
4 GB (more on 64-bit file systems) |
no obvious limit |
1 Exabyte |
512 GB |
| Max single field |
1 GB |
? |
2 GB |
2 GB |
? |
| Max number of rows |
"unlimited" |
no obvious limit |
no obvious limit |
limited by storage |
4 x 109 |
| Max number of columns |
1600 |
? |
1000 |
1024 |
1012 |
| Max indexed columns |
unknown |
32 (more by recompiling) |
no limit |
250 per table |
32767 or storage limited |
Performance Tests
A number of performance tests were carried out.
The test datasets we used were samples of the following all-sky catalogs covering about 400 square degrees in a box from 0 to 4 hours RA and 30 to 37.5 degrees declination, which amounted to a sample of about 1% of the sky.
- GSC - Hubble Guide Star Catalog v1.2: 275,154 rows, 11 columns including PCODE.
- USNO-A2 - US Naval Observatory Catalog: 3,476,948 rows, 6 columns including PCODE.
These were converted from the original FITS tables or binary format into text files, as most DBMS are unable to ingest data in any more efficient format. Note: the additional PCODE column was computed during the conversion to plain ASCII, as were the addition rows required by the PCODE algorithm. The times reported in these tests are the elapsed times, not just cpu times, since I/O overhead is often a significant factor, but care was taken that the machine was virtually idle during the testing, apart from the DBMS processes.
Catalogue (fuzzy) Joins: R-trees versus PCODE
The cross-matching of celestial sources in two (or more) catalogues of the sky is an important operation, being the basis of many astronomical data mining investigations. In principle it merely requires a
JOIN between tables, but this has to be based on approximate match of coordinates within combined error limits, and using a
great circle distance, since the cartesian approximation fails at high declinations.
Until recently it was generally assumed that to perform a fuzzy join efficiently it was necessary to have a spatial index available, but spatial indexing is supported by only a few DBMS. Recently, however, I devised an alternative algorithm which I called the PCODE method, being based on a pixellation of the sky. This is described more fully in my
SkyIndexing document. The PCODE method requires an additional PCODE column in the tables, and some extra rows, but the DBMS only has to carry out an integer equi-join, followed by some simple post-processing to remove a small number of spurious matches and duplicated results, so it is within the capabilities of any common-or-garden DBMS.
I thought it would be interesting to compare these methods on a suitable DBMS. Postgres has a built-in spatial data type with R-tree indexing, while Oracle has a Spatial Data Option with similar or perhaps more advanced facilities. It turned out to be much easier to get PostgreSQL working than Oracle, and the results are shown in the next table.
| |
R-tree |
PCODE with B-tree |
| Time to load USNO |
275 s |
243 s |
| Time to create index |
6408 s |
233 s |
| Size of resulting table |
496 MB |
293 MB |
| Size of index |
248 MB |
62 MB |
| Time to compute fuzzy join |
347 s |
143 s |
The superiority of the PCODE join seems clear. In particular the time to create the R-tree index is rather high: if it scales linearly (and it may be worse than that) the time to create an R-tree of the whole USNO-A2 catalogue rather than under 1% of it would be just under 14 days.
This is consistent with results reported at an ADASS conference.
PostgreSQL versus MySQL versus cFITSIO
The following tests compare the two freeware DBMS using the same computer (a Pentium-II with 450 MHz processor and internal IDE discs). In order to assess the overheads (or otherwise) of using a DBMS, I decided also to compare these two with compiled code using the cFITSIO library to access the same datasets stored in a FITS binary table. These tests were limited to (a) a simple join on PCODE and (b) a sequential scan for mean and standard deviation.
For the DBMS two joins were computed:
- A basic equi-join on PCODE columns
- The equi-join followed by the necessary great-circle distance filter to weed out the sources with identical PCODE values but not actually within the required distance,
Various other tests, including a LEFT OUTER join, were also computed, but the times were generally comparable, and not worth showing separately. Finally a sequential scan operation was carried out, finding the mean and standard-deviation of a single column, and then (for the DBMS) selecting points more than 3 sigma above the mean (there were only 18 points out of 3.5M like this).
| |
PostgreSQL |
MySQL |
cFITSIO |
DB2 |
| Time to load USNO sample |
243 s |
80 s |
143 s |
30 s |
| Time to create B-tree index |
233 s |
72 s |
36 s |
50 s |
| Size of resulting table |
293 MB |
129 MB |
111 MB |
184 MB |
| Size of index file |
62 MB |
32 MB |
30 MB |
~30 MB |
| Equijoin on PCODE |
143 s |
121 s |
102 s |
75 s |
| Find mean,stddev of usno (3.5M rows) |
54 s |
7.3 s |
2.0 s |
9 s |
| Select from usno for bmag>mean+3sigma |
126s |
13.8 s |
- |
15 s |
| select count(*) from usno; |
41 s |
0 s |
- |
5 s |
It is clear that there is a significant DBMS overhead on most operations, and especially on those involving a sequential scan. But it should be noted that the B-tree algorithm used here is relatively basic, and no doubt could be optimised to perform better. The data loading of the FITS file was also slow: again no attempt was made to optimise this at all.
Effect of Processor Speed
Most of the evaluation work done at Leicester used an elderly PC with a Pentium-II running at 450 MHz. I ran some tests of Mysql using a 1200 MHz Pentium, otherwise fairly similar. The received wisdom is that most database operations are I/O limited, but the figures below suggest otherwise. All times are elapsed times in seconds.
| |
450 MHz |
1200 MHz |
ratio |
| Import USNO |
81.1 |
49.7 |
1.6 |
| Import GSC |
8.9 |
3.3 |
2.7 |
| Create index on USNO |
64.9 |
65.1 |
1.0 |
| Equijoin on PCODE |
126.5 |
21.1 |
6.0 |
| Find mean and stddev of BMAG |
7.0 |
1.8 |
3.9 |
| Select bmag > 23.2 |
6.8 |
1.5 |
4.5 |
I do not understand all these figures, particularly the fact that index creation did not seem faster, but in the other cases the operations are clearly speeded up by having a faster cpu. The fact that the FITSIO operations are so much faster than using Mysql also suggests that disc I/O is not a limiting factor in most operations. This I find somewhat surprising.
Column-oriented Tables versus Row-oriented Tables
I am grateful to Prof Peter Buneman of Edinburgh University for rekindling my interest in this topic. He pointed out that where it is necessary to do a sequential scan of a large table, but only one or two columns are of interest out of a large number, it should be much more efficient to store the data for each column contiguously, rather than each row. I was able to point out to him that astronomers had already thought of this: the ESO/MIDAS Table System stores its data column-wise, and the STSDAS table file system (an add-on to IRAF produced at STScI) the creator of a file has the option of using row or column orientation. It then occurred to me that the FITS binary table format might just be perverted to store data column-wise: the necessary change is to have a table with just one row, but to have each column defined as a vector of NR elements, where there would normally be NR rows. I consulted the author of the FITSIO library, Willian Pence of GSFC, who told me that there was nothing in the library which would prevent such access being handled efficiently. Fortunately the interface routines are flexible enough to make this perfectly feasible.
I therefore wrote a couple of short programs to first convert the usno.fits file into column-major order, and then rewrite my fstats program to handle this format. Neither task took long.
The results are shown here for the computation of mean and standard deviation on a single column (bmag) of the usno sample table (3.5 million rows), using the 450 MHz PC:
| |
FITSIO row-orientated |
FITSIO column-oriented |
| Find mean and std.devn of 3.5M rows |
1.95 s |
0.66 s |
The speed-up factor of around 3 is perhaps not all that impressive, but this is a table of only seven columns, and perhaps the overheads of starting the program and opening the FITS table are significant. Using a larger table would throw light on this. But it is notable that an operation which in
PostgreSQL takes 54 seconds, can be speeded up by a factor of 80 by using FITSIO and column-orientation. The final figure, 0.66 seconds, to retrieve 3.5 million values of 4 bytes each equates to a bandwidth of 21 MB/s. I suspect that this is near the bandwidth limit of the internal IDE disc in this PC, nearly 3 years old. It is discouraging that the best DBMS that we have tested cannot even manage even a tenth of this speed.
Detailed Notes
More detailed notes on the various evaluations are available in separate documents:
Preliminary Conclusions
For data archiving, it is obvious from the number of systems using all these DBMS and many more, that reasonable facilities are provided by any modern DBMS. It is not clear that the ability of the object-relational DBMS is of significant benefit. All these DBMS will store BLOBs (binary large objects) and such facilities have been available for many years, but hardly any astronomical archives have chosen to use them, and all those that I know also duplicate these objects in external storage. The ability to define one's own data type in principle gives superior facilities: the ability to index on the properties of such an object. In practice, however, the effort involved (which requires the creating of suitable indexing functions, and interfacing with the query optimiser to take advantage of such indices) seems disproportionate to the gain.
Mysql seems generally faster than Postgres, and has an appealing simplicity. We cannot directly compare the speeds of these two free DBMS with the commercial ones, because we did not use the same hardware, but there does not appear to be a big difference in the query performance.
The commercial DBMS vendors have recently put considerable effort into getting their products to take advantage of PC clusters or farms; it may be possible to get the free DBMS to scale as well, but this is not built-in, it will probably require a lot of development effort.
As far as data mining is concerned these systems all have quite severe limitations, although there is no single "show stopper", they look far from ideal. It is hard to separate the DBMS limitations from those of SQL, but since SQL is essentially the only interface, perhaps such a distinction is unnecessary.
- Non-procedural syntax: The syntax of SQL is result-orientated, which most scientists find counter-intuitive, since they are used to breaking a complex operation down in to simpler stages. With SQL you don't do that, indeed you often have to work backwards from the desired result.
- Multi-line commands in SQL are the norm, but it is hard to get them right; error messages are cryptic and generally unhelpful.
- Poor standards-compliance: Although there is an international standard for SQL92, there is hardly a single DBMS on the market which even implements the whole of its Entry Level (the lowest of four levels). And since NIST gave up checking for compliance some years ago, the situation is likely to get worse. This particularly affects the VO as users are likely to interact with a number of different DBMS, all speaking different dialects. Although the basic facilities can be wrapped in GUI functions, the power user may want the full power of SQL. Many commands which will often be required, such as buld loading of data, exporting to a text or HTML file, performing outer joins, and selecting into another table, all involve different SQL on different DBMS. Even something as simple as squaring a value can be done, variously, as POW(x,2), or x^2, or SQUARE(x). I think the POW function is the standard form, but it's hard to tell.
- No metadata: nothing besides the column name and its data type is handled by the DBMS. It is possible to devise complex schema in which separate tables hold metadata, but if you use SQL to select or project new tables, these metadata will be left behind.
- Floating-point numbers are displayed poorly, with arbitrary switching from fixed-point to exponential notation, and arbitrary decimal places. Most DBMS have functions to format such numbers in a controlled way, to get aligned decimal points in columns, but many of these are non-standard. None of the DBMS have functions built-in which can input or output angles in sexagesimal format; in principle one could use the formatting functions provided to build suitable functions, but the languages in which external functions can be written differ surprisingly from one DBMS to another.
- Nulls are supported fully by all the DBMS examined so far, but there appears to be no standard way of representing them in imported or exported text files.
- Binary import/export is an almost essential facility since many sources of data will be large, and probably available in FITS binary format, but is almost never supported. Conversion to text is liable to lose precision, is error-prone, slow, and uses unnecessary volumes of disc space.
- User-written functions can be provided in all systems examined so far, but the languages accepted vary, and none of the mechanisms is very easy to use.
- Resource usage in terms of the disc space likely to be occupied by a table after ingestion is very hard to estimate on most systems. Here Mysql is an exception, and its overheads (with the Myisam backend) are low.
- Network access neither of the free DBMS have any mechanism for accessing tabular information present on other networked computers. None of them supports any truly distributed database. In contrast my
fstats program was instantly able to process a file present on a remote system, merely by giving it a URL instead of the filename containing the FITS table. When reading a file on the LAN the speed was almost as fast as when reading a local file. FTP and HTTP access are built into the FITSIO library, of course, so this comes free.
- Graphical operations are conspicuous by there absence: a data exploration system really ought to allow the user to plot histograms, or plot one column against another, and perform similar operations. Even producing a simple histogram as a table involves some messy SQL.
I am not sure what the solution to the data mining and data exploration problem is, but I think that an SQL-speaking DBMS forms only a small part of it.
-- Clive Page, and other members of the
AstroGrid database team, - 19 Aug 2002
I just came across an interesting comparison of Mysql, Postgres, and Oracle on the CERN web-site - somewhat out-of-date, but still useful in parts. See
http://hep-proj-database.web.cern.ch/hep-proj-database/db_compar.htm
Note added 2003 July 30: unfortunately this link seems now to be broken and I cannot find the document even on the wayback machine. It contained some strong opinions on these products, and one can only speculate on why it has been withdrawn.
Another interesting document from CERN was presented at GGF5 on their
Spitfire Project, see this
PDF file
Another interesting comparison of SQL support in DBMS comes from
Troels Arvin and
his links.
Contributors to this page include Bob Mann and Clive Davenhall of the University of Edinburgh, and Patricio Ortiz and Clive Page of the University of Leicester.
-- Last updated by Clive Page, 15 Dec 2003.