Notes on MySQL, PostgreSQL, and DB2.
Starting and stopping server and client
| |
Mysql |
Postgres |
DB2 |
| Start server |
# safe_mysqld --skip-grant-tables |
> pg_ctl reload -D /home/data/postgres -l logfile |
|
| Stop server |
> mysqladmin shutdown |
> pg_ctl stop |
|
| Start client |
> mysql |
> psql |
> db2 |
| Stop client |
mysql> \q |
cgp=# \q |
db2 => quit |
Note: for DB2 the
db2batch facility was more useful: basic syntax:
db2batch -f test.sql -d sample -o f 10.
The
-o f 10 specification limits results to 10 rows (a limit can also be specified within SQL).
Basic operations
| Operation |
MySQL |
Postgres |
DB2 |
| Show databases |
SHOW DATABASES; |
\l |
|
| Show tables in current database |
SHOW TABLES; |
\d |
LIST TABLES |
| Show fields in a table |
DESCRIBE tbl; |
\d tbl |
DESCRIBE TABLE tbl |
| Show indexes |
SHOW INDEX FROM tbl; |
\di |
DESCRIBE INDEXES FOR TABLE tbl |
| Get elapsed time of each operation |
(default) |
\timing |
(use db2batch) |
| Execute SQL queries from a text file |
\. filename |
\i filename |
(use db2 or db2batch) |
| Execute shell script when logged in to DBMS |
\! filename |
\! filename |
|
| Send results to file |
\T filename |
\o filename |
db2 -z outfile or db2batch -r outfile |
| Set/unset pager for output |
\p more (to cancel \n) |
\pset pager |
|
| Quit |
\q |
\q |
quit |
Data Types
This is a table of the basic numerical datatypes, with their names or
aliases in the various DBMS.
| Data Type |
MySQL |
Postgres |
DB2 |
| Integer 1-byte |
TINYINT |
- |
- |
| Integer 2-bytes |
SMALLINT |
SMALLINT |
SMALLINT |
| Integer 4-byte |
INT |
INT |
INT |
| Integer 8-byte |
BIGINT |
BIGINT |
BIGINT |
| Floating-point 4-byte |
FLOAT, FLOAT4 |
REAL, FLOAT4 |
REAL |
| Floating-point 8-byte |
DOUBLE, REAL, FLOAT8 |
DOUBLE PRECISION, FLOAT8 |
DOUBLE, FLOAT |
| Logical |
- |
BOOLEAN |
- |
The types in
bold are the ones likely to give trouble: the meanings of
FLOAT and
REAL being reversed between MySQL and DB2. The only way to get compatibility between
MySQL and Postgres was to use the non-standard data types FLOAT4 and FLOAT8, but these are not supported by DB2. The presence of a TINYINT type in
MySQL is a bonus since may astronomical tables contain small integers, and it is a pity that other BMS do not support it.
SQL commands - portable
ALTER TABLE tbl ADD COLUMN (col1 dtype1, ...) ;
CREATE DATABASE db ;
CREATE INDEX ind ON tbl(col) ;
CREATE TABLE tbl(col1 type1 [NOT NULL], col2 type2, ... ) ;
DROP TABLE tbl ;
INSERT INTO ... SELECT FROM ... ;
SELECT [DISTINCT] cols FROM tbl [ [LEFT OUTER] JOIN tbl2] [WHERE exprn]
[GROUP BY col] [ORDER BY col [DESC]] ;
UPDATE tbl SET col = exprn [WHERE exprn] ;
SQL Commands - DBMS-specific
| |
MySQL |
Postgres |
DB2 |
| Create spatial index |
CREATE SPATIAL INDEX ind ON tbl(col); |
CREATE INDEX ind ON tbl USING RTREE(col) ; |
CREATE INDEX ind ON tbl(col) EXTEND USING db2gse.spatial_index(gs1,gs2,gs3) ; |
| Load CSV data from text file |
LOAD DATA INFILE '/path' INTO TABLE tbl [FIELDS TERMINATED BY "#"] ; |
COPY tbl FROM '/path' [WITH DELIMITER '#'] |
LOAD FROM filename OF DEL [MODIFIED BY coldel#] METHOD P(1,2,3,...) INSERT INTO tbl ; |
| Limit rows returned by SELECT (optional offset) |
SELECT ... LIMIT [offset,] rows ; |
SELECT ... LIMIT rows [OFFSET offset] ; |
SELECT ... FETCH FIRST rows ROWS ONLY ; |
| Gather statistics so index is used |
(done automatically) |
ANALYZE [VERBOSE] tbl ; |
(done automatically) |
| SELECT returning results into a new table |
CREATE TABLE newtbl SELECT ... ; |
SELECT ... INTO TABLE newtbl ... ; |
CREATE TABLE newtbl (col1 type1, ...); INSERT INTO newtbl SELECT ... ; |
| Drop index |
DROP INDEX ind ON tbl ; |
DROP INDEX ind ; |
DROP INDEX ind ; |
Notes on limitations of MySQL and Postgres
- Spatial indexing
- Only supported in version 8.1-alpha onwards. At the time of testing funtionality was incomplete, and performance was poor.
- No defined external data format for spatial data types (except ones designed to interface to Geographic Information systems). I had to use INSERT statements to load data, which was very slow.
- Spatial storage requires 5-point closed polygons as simplest representation of a box - somewhat verbose and inefficient.
- The installation on Linux from RPM insisted on putting all the data on the /var partition, which was too small to hold it. All methods of changing to an alternative data directory failed. The manual suggested the only cure was to get a source distribution and recompile with different configuration file. Eventually I found information in newsgroups about this: it is possible to fool it using soft links.
- Data types are somewhat non-standard: I eventually used FLOAT4 and FLOAT8 which are compatible with MySQL and Postgres, but not DB2.
- Has many minor differences from SQL Standard, for example || is used to mean logical OR not string concatentation.
- The evaluation of MySQL queries seems to be rather less safe than with other DBMS. These sites of MySQL gotchas and Troels Arvins have more details.
- User-defined functions have to be written in C and linked into the MySQL executable - more a system administrator function than something the average user can use easily.
- Has no equivalent of the clustered index of Postgres to speed up indexed access.
- No equivalent of the Postgres dblink utility to access remote tables.
- No equivalent of Postgres schemas, which allow different users to have different namespaces for their tables.
- MySQL maps one table to a file on the operating system: large tables can easily correspond to files of more than 2 GB in size. The version of Linux is use coped with this without problem, but some older operating systems might not. Postgres always mapped large tables to a series of smaller files.
Postgres
- After every INDEX creation (or a substantial update) an ANALYZE operation has to be executed, or the index will be ignored by the query optimiser. The ANALYZE is substantially faster in version 7.4, but it is still a nuisance that it has to be executed.
- SELECT COUNT(*) operations were very slow: Postgres has no internal counts of non-deleted rows, it always insists on counting afresh, while MySQL gives almost instant results for simple count operations.
- Some queries were found to run more slowly when two indices were available than only one, which was odd.
- If an index is created on a column of REAL (FLOAT4) type, a simple query expression such as
SELECT * FROM table WHERE mag BETWEEN 12.5 AND 13.5 will fail to use the index, as the constants in the WHERE expression are by default of DOUBLE PRECISION (FLOAT8) type and not converted. The fix for this is to rewrite all such queries with an explicit type cast in them, for example =SELECT * from table WHERE mag BETWEEN 12.5::real AND 13.5::real; This is a real nuisance, and hard to explain to the general user.
- When Postgres does an UPDATE operation it always creates new records for the new information; the old records are simply rendered inactive. This can use a lot of disc space. The remedy is to execute a
VACUUM FULL operation, which removes inactive records, moves the active data to the base of the table, and truncates the files.
- The function for exponentiation was called POW whereas all other DBMS investigated called it POWER; this could be fixed with a suitably-named user-defined function, I suppose. Postgres also has an exponentiation operator
^, but this is non-standard.
Data Storage and I/O
These are important considerations when using large tables. A sequence of operation required in loading source catalogues and generating a spatial index on each was the following:
| Operation |
MySQL |
Postgres |
| (1) ALTER TABLE table ADD COLUMN errbox; |
Creates a new table with rows longer than before one because of the new (empty) field; deletes the old table; renames the new one |
Just marks the metadata, makes no change to the table |
| (2) UPDATE TABLE table SET errbox = expression; |
Alters each row in place to insert the value |
Creates a new set of rows in the same table with updated values; marks the old rows as inactive |
| (3) VACUUM FULL ANALYZE table; |
(not needed with MySQL) |
Reformats the table so all active rows are at the beginning; truncates the file to remove inactive rows; produces statistics for query optimizer |
I think it is possible to see that the maximum disc space used by each DBMS is similar: the maximum with Mysql is temporary, just before the renaming takes place at the end of step (2). With Postgres the size is not reduced until the VACUUM is complete. Each of them also writes each affected row twice: Mysql when first extending the rows to add empty space, than secondly when updating them in-place. Postgres does the first write when doing the UPDATE, and a second time when it has to re-write each row during the VACUUM.
With Postgres, however, it is possible to use a CLUSTER command to require that the data organisation in the main table follows that of the named principal index. This requires a wholesale rewriting of the table, which it does by copying the data into a new table and renaming it, and deleting the old one. If this is done it requires no VACUUM FULL nor ANALYZE: the index is usable immediately. This, I think, uses as many I/O operations as the simple UPDATE and VACUUM, but one ends up with a more efficient indexed access, since related data are stored as far as possible contiguously on disc. MySQL has no equivalent of this at present. Conclusion: they are about equally inefficient in doing this update, but Postgres can end up with a superior result.
Postgres has larger disc storage overheads, typically about 40 bytes/row in addition to the sum of the field-widths. The corresponding overhead for MySQL is about one byte/row.
--
ClivePage - 8 Feb 2005