Notes on MySQL, PostgreSQL, and DB2.

Starting and stopping server and client

Mysql Postgres DB2
Start server # safe_mysqld --skip-grant-tables > pg_ctl reload -D /home/data/postgres -l logfile  
Stop server > mysqladmin shutdown > pg_ctl stop  
Start client > mysql > psql > db2
Stop client mysql> \q cgp=# \q db2 => quit

Note: for DB2 the db2batch facility was more useful: basic syntax: db2batch -f test.sql -d sample -o f 10. The -o f 10 specification limits results to 10 rows (a limit can also be specified within SQL).

Basic operations

Operation MySQL Postgres DB2
Show databases SHOW DATABASES; \l  
Show tables in current database SHOW TABLES; \d LIST TABLES
Show fields in a table DESCRIBE tbl; \d tbl DESCRIBE TABLE tbl
Show indexes SHOW INDEX FROM tbl; \di DESCRIBE INDEXES FOR TABLE tbl
Get elapsed time of each operation (default) \timing (use db2batch)
Execute SQL queries from a text file \. filename \i filename (use db2 or db2batch)
Execute shell script when logged in to DBMS \! filename \! filename  
Send results to file \T filename \o filename db2 -z outfile or db2batch -r outfile
Set/unset pager for output \p more (to cancel \n) \pset pager  
Quit \q \q quit

Data Types

This is a table of the basic numerical datatypes, with their names or aliases in the various DBMS.

Data Type MySQL Postgres DB2
Integer 1-byte TINYINT - -
Integer 2-bytes SMALLINT SMALLINT SMALLINT
Integer 4-byte INT INT INT
Integer 8-byte BIGINT BIGINT BIGINT
Floating-point 4-byte FLOAT, FLOAT4 REAL, FLOAT4 REAL  
Floating-point 8-byte DOUBLE, REAL, FLOAT8 DOUBLE PRECISION, FLOAT8 DOUBLE, FLOAT
Logical - BOOLEAN -

The types in bold are the ones likely to give trouble: the meanings of FLOAT and REAL being reversed between MySQL and DB2. The only way to get compatibility between MySQL and Postgres was to use the non-standard data types FLOAT4 and FLOAT8, but these are not supported by DB2. The presence of a TINYINT type in MySQL is a bonus since may astronomical tables contain small integers, and it is a pity that other BMS do not support it.

SQL commands - portable

    ALTER TABLE tbl ADD COLUMN (col1 dtype1, ...) ;

    CREATE DATABASE db ;
    
    CREATE INDEX ind ON tbl(col) ;
    
    CREATE TABLE tbl(col1 type1 [NOT NULL], col2 type2, ... ) ;
    
    DROP TABLE tbl ;
    
    INSERT INTO ... SELECT FROM ... ;

    SELECT [DISTINCT] cols FROM tbl [ [LEFT OUTER] JOIN tbl2] [WHERE exprn]
     [GROUP  BY col] [ORDER BY col [DESC]] ;
    
    UPDATE tbl SET col = exprn [WHERE exprn] ;

SQL Commands - DBMS-specific

MySQL Postgres DB2
Create spatial index CREATE SPATIAL INDEX ind ON tbl(col); CREATE INDEX ind ON tbl USING RTREE(col) ; CREATE INDEX ind ON tbl(col) EXTEND USING db2gse.spatial_index(gs1,gs2,gs3) ;
Load CSV data from text file LOAD DATA INFILE '/path' INTO TABLE tbl [FIELDS TERMINATED BY "#"] ; COPY tbl FROM '/path' [WITH DELIMITER '#'] LOAD FROM filename OF DEL [MODIFIED BY coldel#] METHOD P(1,2,3,...) INSERT INTO tbl ;
Limit rows returned by SELECT (optional offset) SELECT ... LIMIT [offset,] rows ; SELECT ... LIMIT rows [OFFSET offset] ; SELECT ... FETCH FIRST rows ROWS ONLY ;
Gather statistics so index is used (done automatically) ANALYZE [VERBOSE] tbl ; (done automatically)
SELECT returning results into a new table CREATE TABLE newtbl SELECT ... ; SELECT ... INTO TABLE newtbl ... ; CREATE TABLE newtbl (col1 type1, ...); INSERT INTO newtbl SELECT ... ;
Drop index DROP INDEX ind ON tbl ; DROP INDEX ind ; DROP INDEX ind ;

Notes on limitations of MySQL and Postgres

MySQL

  • Spatial indexing
    • Only supported in version 8.1-alpha onwards. At the time of testing funtionality was incomplete, and performance was poor.
    • No defined external data format for spatial data types (except ones designed to interface to Geographic Information systems). I had to use INSERT statements to load data, which was very slow.
    • Spatial storage requires 5-point closed polygons as simplest representation of a box - somewhat verbose and inefficient.
  • The installation on Linux from RPM insisted on putting all the data on the /var partition, which was too small to hold it. All methods of changing to an alternative data directory failed. The manual suggested the only cure was to get a source distribution and recompile with different configuration file. Eventually I found information in newsgroups about this: it is possible to fool it using soft links.
  • Data types are somewhat non-standard: I eventually used FLOAT4 and FLOAT8 which are compatible with MySQL and Postgres, but not DB2.
  • Has many minor differences from SQL Standard, for example || is used to mean logical OR not string concatentation.
  • The evaluation of MySQL queries seems to be rather less safe than with other DBMS. These sites of MySQL gotchas and Troels Arvins have more details.
  • User-defined functions have to be written in C and linked into the MySQL executable - more a system administrator function than something the average user can use easily.
  • Has no equivalent of the clustered index of Postgres to speed up indexed access.
  • No equivalent of the Postgres dblink utility to access remote tables.
  • No equivalent of Postgres schemas, which allow different users to have different namespaces for their tables.
  • MySQL maps one table to a file on the operating system: large tables can easily correspond to files of more than 2 GB in size. The version of Linux is use coped with this without problem, but some older operating systems might not. Postgres always mapped large tables to a series of smaller files.

Postgres

  • After every INDEX creation (or a substantial update) an ANALYZE operation has to be executed, or the index will be ignored by the query optimiser. The ANALYZE is substantially faster in version 7.4, but it is still a nuisance that it has to be executed.
  • SELECT COUNT(*) operations were very slow: Postgres has no internal counts of non-deleted rows, it always insists on counting afresh, while MySQL gives almost instant results for simple count operations.
  • Some queries were found to run more slowly when two indices were available than only one, which was odd.
  • If an index is created on a column of REAL (FLOAT4) type, a simple query expression such as SELECT * FROM table WHERE mag BETWEEN 12.5 AND 13.5 will fail to use the index, as the constants in the WHERE expression are by default of DOUBLE PRECISION (FLOAT8) type and not converted. The fix for this is to rewrite all such queries with an explicit type cast in them, for example =SELECT * from table WHERE mag BETWEEN 12.5::real AND 13.5::real; This is a real nuisance, and hard to explain to the general user.
  • When Postgres does an UPDATE operation it always creates new records for the new information; the old records are simply rendered inactive. This can use a lot of disc space. The remedy is to execute a VACUUM FULL operation, which removes inactive records, moves the active data to the base of the table, and truncates the files.
  • The function for exponentiation was called POW whereas all other DBMS investigated called it POWER; this could be fixed with a suitably-named user-defined function, I suppose. Postgres also has an exponentiation operator ^, but this is non-standard.

Data Storage and I/O

These are important considerations when using large tables. A sequence of operation required in loading source catalogues and generating a spatial index on each was the following:

Operation MySQL Postgres
(1) ALTER TABLE table ADD COLUMN errbox; Creates a new table with rows longer than before one because of the new (empty) field; deletes the old table; renames the new one Just marks the metadata, makes no change to the table
(2) UPDATE TABLE table SET errbox = expression; Alters each row in place to insert the value Creates a new set of rows in the same table with updated values; marks the old rows as inactive
(3) VACUUM FULL ANALYZE table; (not needed with MySQL) Reformats the table so all active rows are at the beginning; truncates the file to remove inactive rows; produces statistics for query optimizer

I think it is possible to see that the maximum disc space used by each DBMS is similar: the maximum with Mysql is temporary, just before the renaming takes place at the end of step (2). With Postgres the size is not reduced until the VACUUM is complete. Each of them also writes each affected row twice: Mysql when first extending the rows to add empty space, than secondly when updating them in-place. Postgres does the first write when doing the UPDATE, and a second time when it has to re-write each row during the VACUUM.

With Postgres, however, it is possible to use a CLUSTER command to require that the data organisation in the main table follows that of the named principal index. This requires a wholesale rewriting of the table, which it does by copying the data into a new table and renaming it, and deleting the old one. If this is done it requires no VACUUM FULL nor ANALYZE: the index is usable immediately. This, I think, uses as many I/O operations as the simple UPDATE and VACUUM, but one ends up with a more efficient indexed access, since related data are stored as far as possible contiguously on disc. MySQL has no equivalent of this at present. Conclusion: they are about equally inefficient in doing this update, but Postgres can end up with a superior result.

Postgres has larger disc storage overheads, typically about 40 bytes/row in addition to the sum of the field-widths. The corresponding overhead for MySQL is about one byte/row.

-- ClivePage - 8 Feb 2005

Topic revision: r4 - 2005-02-08 - 09:43:11 - ClivePage
 
AstroGrid Service Click here for the
AstroGrid Service Web
This is the AstroGrid
Development Wiki

This site is powered by the TWiki collaboration platformCopyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback