+ +

+ + index + +
+ + mysql + +
+ + choose-something-else + +

+ + + +

Do Not Pass This Way Again

Considering MySQL? Use something else. Already on MySQL? Migrate. For every +successful project built on MySQL, you could uncover a history of time wasted +mitigating MySQL's inadequacies, masked by a hard-won, but meaningless, sense +of accomplishment over the effort spent making MySQL behave.

Thesis: databases fill roles ranging from pure storage to complex and +interesting data processing; MySQL is differently bad at both tasks. Real apps +all fall somewhere between these poles, and suffer variably from both sets of +MySQL flaws.

MySQL is bad at storage.
MySQL is bad at data processing.
MySQL is bad by design.
Bad arguments for using MySQL.

Much of this is inspired by the principles behind PHP: A Fractal of Bad +Design. I +suggest reading that article too -- it's got a lot of good thought in it even +if you already know to stay well away from PHP. (If that article offends you, +well, this page probably will too.)

Storage

Storage systems have four properties:

Take and store data they receive from applications.
Keep that data safe against loss or accidental change.
Provide stored data to applications on demand.
Give administrators effective management tools.

In a truly “pure” storage application, data-comprehension features +(constraints and relationships, nontrivial functions and aggregates) would go +totally unused. There is a time and a place for this: the return of “NoSQL” +storage systems attests to that.

Pure storage systems tend to be closely coupled to their “main” application: +consider most web/server app databases. “Secondary” clients tend to be +read-only (reporting applications, monitoring) or to be utilities in service +of the main application (migration tools, documentation tools). If you believe +constraints, validity checks, and other comprehension features can be +implemented in “the application,” you are probably thinking of databases close +to this pole.

Storing Data

MySQL has many edge cases which reduce the predictability of its behaviour +when storing information. Most of these edge cases are documented, but violate +the principle of least surprise (not to mention the expectations of users +familiar with other SQL implementations).

Implicit conversions (particularly to and from string types) can modify + MySQL's behaviour.
- Many implicit conversions are also silent (no warning, no diagnostic), + by design, making it more likely developers are entirely unaware of + them until one does something surprising.
+
Conversions that violate basic constraints (range, length) of the output + type often coerce data rather than failing.
- Sometimes this raises a warning; does your app check for those?
- This behaviour is unlike many typed systems (but closely like PHP and + remotely like Perl).
+
Conversion behaviour depends on a per-connection configuration value + (sql_mode) that has a large constellation of possible + states, making + it harder to carry expectations from manual testing over to code or from + tool to tool.
MySQL recommends UTF-8 as a character-set, but still defaults to Latin-1. + The implimentation of utf8 up until MySQL 5.5 was only the 3-byte + BMP. + MySQL 5.5 and beyond supports a 4-byte utf8, but confusingly must be set + with the character-set utf8mb4. Implementation details of these encodings + within MySQL, such as the utf8 3-byte limit, tend to leak out into client + applications. Data that does not fit MySQL's understanding of the storage + encoding will be transformed until it does, by truncation or replacement, by + default.
- Collation support is per-encoding, with one of the stranger default + configurations: by default, the collation orders characters according to + Swedish alphabetization rules, case-insensitively.
- Since it's the default, lots of folks who don't know the manual + inside-out and backwards observe MySQL's case-insensitive collation + behaviour ('a' = 'A') and conclude that “MySQL is case-insensitive,” + complicating any effort to use a case-sensitive locale.
- Both the encoding and the collation can vary, independently, by + column. Do you keep your schema definition open when you write + queries to watch out for this sort of shit?
+
The TIMESTAMP type tries to do something smart by storing values in a + canonical timezone (UTC), but it's done with so few affordances that it's + very hard to even tell that MySQL's done a right thing with your data.
- And even after that, the result of foo < '2012-04-01 09:00:00' still + depends on what time of year it is when you evaluate the query, unless + you're very careful with your connection timezone.
- TIMESTAMP is also special-cased in MySQL's schema definition handling, + making it easy to accidentally create (or to accidentally fail to + create) an auto-updating field when you didn't (did) want one.
- DATETIME does not get the same timezone handling TIMESTAMP does. + What? And you can't provide your own without resorting to hacks like + extra columns.
- Oh, did you want to use MySQL's timezone support? Too bad, none of + that data's loaded by default. You have to process the OS's tzinfo + files into SQL with a separate tool and import that. If you ever want to + update MySQL's timezone settings later, you need to take the server down + just to make sure the changes apply.
+

Preserving Data

... against unexpected changes: like most disk-backed storage systems, MySQL +is as reliable as the disks and filesystems its data lives on. MySQL provides +no additional functionality in terms of mirroring or hardware failure tolerance +(such as Oracle ASM). +However this is a limitation shared with many, many other systems.

When using the InnoDB storage engine (default since MySQL 5.5), MySQL maintains page +checksums in order to detect corruption caused by underlying storage. However, +many third-party software applications, as sell as users upgrading +from earlier versions of MySQL may be using MyISAM, which will frequently corrupt +data files on improper shutdown.

The implicit conversion rules that bite when storing data also bite when +asking MySQL to modify data - my favourite example being a fat-fingered +UPDATE query where a mistyped = (as -, off by a single key) caused 90% +of the rows in the table to be affected, instead of one row, because of +implicit string-to-integer conversions.

... against loss: hoo boy. MySQL, out of the box, gives you three approaches +to backups:

Take “blind” filesystem backups with tar or rsync. Unless you + meticulously lock tables or make the database read-only for the duration, + this produces a backup that requires crash recovery before it will be + usable, and can produce an inconsistent database.
- This can bite quite hard if you use InnoDB, as InnoDB crash recovery + takes time proportional to both the number of InnoDB tables and the + total size of InnoDB tables, with a large constant.
+
Dump to SQL with mysqldump: slow, relatively large backups, and + non-incremental.
Archive binary logs: fragile, complex, over-configurable, and configured + badly by default. (Binary logging is also the basis of MySQL's replication + system.)

If neither of these are sufficient, you're left with purchasing a backup tool +from +Oracle +or from one of the third-party MySQL vendors.

Like many of MySQL's features, the binary logging feature is +too +configurable, +while still, somehow, defaulting to modes that are hazardous or surprising: +the +default +behaviour +is to log SQL statements, rather than logging their side effects. This has +lead to numerous bugs over the years; MySQL (now) makes an effort to make +common “non-deterministic” cases such as NOW() and RANDOM() act +deterministically but these have been addressed using ad-hoc solutions. +Restoring binary-log-based backups can easily lead to data that differs from +the original system, and by the time you've noticed the problem, it's too late +to do anything about it.

(Seriously. The binary log entries for each statement contain the “current” +time on the master and the random seed at the start of the statement, just in +case. If your non-deterministic query uses any other function, you're still +fucked by +default.)

Additionally, a number of apparently-harmless features can lead to backups or +replicas wandering out of sync with the original database, in the default +configuration:

AUTO_INCREMENT and UPDATE statements.
AUTO_INCREMENT and INSERT statements (sometimes). SURPRISE.
Triggers.
User-defined (native) functions.
Stored (procedural SQL) functions.
DELETE ... LIMIT and UPDATE ... LIMIT statements, though if you use + these, you've misunderstood how SQL is supposed to work.
INSERT ... ON DUPLICATE KEY UPDATE statements.
Bulk-loading data with LOAD DATA statements.
Operations on floating-point + values.

Retrieving Data

This mostly works as expected. Most of the ways MySQL will screw you happen +when you store data, not when you retrieve it. However, there are a few things +that implicitly transform stored data before returning it:

+
MySQL's surreal type conversion system works the same way during SELECT + that it works during other operations, which can lead to queries matching + unexpected rows:
+
```
owen@scratch> CREATE TABLE account (
+    ->     accountid INTEGER
+    ->         AUTO_INCREMENT
+    ->         PRIMARY KEY,
+    ->     discountid INTEGER
+    -> );
+Query OK, 0 rows affected (0.54 sec)
+
+owen@scratch> INSERT INTO account
+    ->     (discountid)
+    -> VALUES
+    ->     (0),
+    ->     (1),
+    ->     (2);
+Query OK, 3 rows affected (0.03 sec)
+Records: 3  Duplicates: 0  Warnings: 0
+
+owen@scratch> SELECT *
+    -> FROM account
+    -> WHERE discountid = 'banana';
++-----------+------------+
+| accountid | discountid |
++-----------+------------+
+|         1 |          0 |
++-----------+------------+
+1 row in set, 1 warning (0.05 sec)
+
```
+
Ok, unexpected, but there's at least a warning (do your apps check for +those?) - let's see what it says:
+
```
owen@scratch> SHOW WARNINGS;
++---------+------+--------------------------------------------+
+| Level   | Code | Message                                    |
++---------+------+--------------------------------------------+
+| Warning | 1292 | Truncated incorrect DOUBLE value: 'banana' |
++---------+------+--------------------------------------------+
+1 row in set (0.03 sec)
+
```
+
I can count on one hand the number of DOUBLE columns in this example and +still have five fingers left over.
+
You might think this is an unreasonable example: maybe you should always +make sure your argument types exactly match the field types, and the query +should use 57 instead of 'banana'. (This does actually “fix” the +problem.) It's unrealistic to expect every single user to run SHOW CREATE +TABLE before every single query, or to memorize the types of every column +in your schema, though. This example derived from a technically-skilled +but MySQL-ignorant tester examining MySQL data to verify some behavioural +changes in an app.
+
- +
  Actually, you don't even need a table for this: SELECT 0 = 'banana' + returns 1. Did the PHP folks design + MySQL's = operator?
  +
- +
  This isn't affected by sql_mode, even though so many other things are.
  +
+
+
TIMESTAMP columns (and only TIMESTAMP columns) can return + apparently-differing values for the same stored value depending on + per-connection configuration even during read-only operation. This is done + silently and the default behaviour can change as a side effect of non-MySQL + configuration changes in the underlying OS.
+
String-typed columns are transformed for encoding on output if the + connection is not using the same encoding as the underlying storage, using + the same rules as the transformation on input.
Values that stricter sql_mode settings would reject during storage can + still be returned during retrieval; it is impossible to predict in advance + whether such data exists, since clients are free to set sql_mode to any + value at any time.

Efficiency

For purely store-and-retrieve applications, MySQL's query planner (which +transforms the miniature program contained in each SQL statement into a tree +of disk access and data manipulation steps) is sufficient, but only barely. +Queries that retrieve data from one table, or from one table and a small +number of one-to-maybe-one related tables, produce relatively efficient plans.

MySQL, however, offers a number of tuning options that can have dramatic and +counterintuitive effects, and the documentation provides very little advice +for choosing settings. Tuning relies on the administrator's personal +experience, blog articles of varying quality, and consultants.

The MySQL query cache defaults to a non-zero size in some commonly-installed + configurations. However, the larger the cache, the slower writes proceed: + invalidating cache entries that include the tables modified by a query means + considering every entry in the cache. This cache also uses MySQL's LRU + implementation, which has its own performance problems during eviction that + get worse with larger cache sizes.
Memory-management settings, including key_buffer_size and innodb_buffer_pool_size, + have non-linear relationships with performance. The standard + advice advises + making whichever value you care about more to a large value, but this can be + counterproductive if the related data is larger than the pool can hold: + MySQL is once again bad at discarding old buffer pages when the buffer is + exhausted, leading to dramatic slowdowns when query load reaches a certain + point.
- This also affects filesystem tuning settings such as table_open_cache.
+
InnoDB, out of the box, comes configured to use one large (and automatically + growing) tablespace file for all tables, complicating backups and storage + management. This is fine for trivial databases, but MySQL provides no tools + (aside from DROP TABLE and reloading the data from an SQL dump) for + transplanting a table to another tablespace, and provides no tools (aside + from a filesystem-level rm, and reloading all InnoDB data from an SQL + dump) for reclaiming empty space in a tablespace file.
MySQL itself provides very few tools to manage storage; tasks like storing + large or infrequently-accessed tables and databases on dedicated filesystems + must be done on the filesystem, with MySQL shut down.

Data Processing

Data processing encompasses tasks that require making decisions about data and +tasks that derive new data from existing data. This is a huge range of topics:

Deciding (and enforcing) application-specific validity rules.
Summarizing and deriving data.
Providing and maintaining alternate representations and structures.
Hosting complex domain logic near the data it operates on.

The further towards data processing tasks applications move, the more their +SQL resembles tiny programs sent to the data. MySQL is totally unprepared for +programs, and expects SQL to retrieve or modify simple rows.

Validity

Good constraints are like asserts: in an ideal world, you can't tell if they +work, because your code never violates them. Here in the real world, +constraint violations happen for all sorts of reasons, ranging from buggy code +to buggy human cognition. A good database gives you more places to describe +your expectations and more tools for detecting and preventing surprises. +MySQL, on the other hand, can't validate your data for you, beyond simple (and +fixed) type constraints:

+
As with the data you store in it, MySQL feels free to change your table + definitions implicitly and + silently. + Many of these silent schema changes have important performance and + feature-availability implications.
+
- +
  Foreign keys are ignored if you spell them certain, common, ways:
  +
```
CREATE TABLE foo (
+    -- ...,
+    parent INTEGER
+        NOT NULL
+        REFERENCES foo_parent (id)
+    -- , ...
+)
+
```
  +
  silently ignores the foreign key specification, while
  +
```
CREATE TABLE foo (
+    -- ...,
+    parent INTEGER
+        NOT NULL,
+    FOREIGN KEY (parent)
+        REFERENCES foo_parent (id)
+    -- , ...
+)
+
```
  +
  preserves it.
  +
+
+
Foreign keys, one of the most widely-used database validity checks, are an + engine-specific feature, restricting their availability in combination with + other engine-specific features. (For example, a table cannot have both + foreign key constraints and full-text indexes, as of MySQL 5.5.)
+
- Configurations that violate assumptions about foreign keys, such as a + foreign key pointing into a MyISAM or NDB table, do not cause warnings + or any other diagnostics. The foreign key is simply discarded. SURPRISE. + (MySQL is riddled with these sorts of surprises, and apologists lean + very heavily on the “that's documented” excuse for its bad behaviour.)
+
The MySQL parser recognizes CHECK clauses, which allow schema developers + to make complex declarative assertions about tuples in the database, but + discards them without + warning. If you + want CHECK-like constraints, you must implement them as triggers - but see + below...
MySQL's comprehension of the DEFAULT clause is, uh, limited: only + constants are permitted, except for the special + case + of at most one TIMESTAMP column per table and at most one sequence-derived + column. Who designed this mess?
- Furthermore, there's no way to say “no default” and raise an error when + an INSERT forgets to provide a value. The default DEFAULT is either + NULL or a zero-like constant (0, '', and so on). Even for types + with no meaningful zero-like values (DATETIME).
+
MySQL has no mechanism for introducing new types, which might otherwise + provide a route to enforcing validity. Counting the number of special cases + in MySQL's existing type + system illustrates + why that's probably unfixable.

I hope every client with write access to your data is absolutely perfect, +because MySQL cannot help you if you make a mistake.

Summarizing and Deriving Data

SQL databases generally provide features for doing “interesting” things with +sets of tuples, and MySQL is no exception. However, MySQL's limitations mean +that actually processing data in the database is fraught with wasted money, +brains, and time:

Aggregate (GROUP BY) queries run up against limits in MySQL's query + planner: a query with both WHERE and GROUP BY clauses can only satisfy + one constraint or the other with indexes, unless there's an index that + covers all the relevant fields in both clauses, in the right order. (What + this order is depends on the complexity of the query and on the distribution + of the underlying data, but that's hardly MySQL-specific.)
- If you have all three of WHERE, GROUP BY, and ORDER BY in the same + query, you're more or less fucked. Good luck designing a single index + that satisfies all three.
+
Even though MySQL allows database administrators to define normal functions + in a procedural SQL + dialect, + custom aggregate + functions + can only be defined by native plugins. Good thing, too, because procedural + SQL in MySQL is its own kind of awful - more on that below.
Subqueries are often convenient and occasionally necessary for expressing + multi-step transformations on some underlying data. MySQL's query planner + has only one strategy for optimizing them: evaluate the innermost query as + written, into an in-memory table, then use a nested loop to satisfy joins or + IN clauses. For large subquery results or interestingly nested subqueries, + this is absurdly slow.
- MySQL's query planner can't fold constraints from outer queries into + subqueries.
- The generated in-memory table never has any indexes, ever, even when + appropriate indexes are “obvious” from the surrounding query; you cannot + even specify them.
- These limitations also affect views, which are evaluated as if they were + subqueries. In combination with the lack of constraint folding in the + planner, this makes filtering or aggregating over large views completely + impractical.
- MySQL lacks common table + expressions. + Even if subquery efficiency problems get fixed, the inability to give + meaningful names to subqueries makes them hard to read and comprehend.
- I hope you like CREATE TEMPORARY TABLE AS SELECT, because that's your + only real alternative.
+
Window + functions do not + exist at all in MySQL. This complicates many kinds of analysis, including + time series analyses and ranking analyses.
- Specific cases (for example, assigning rank numbers to rows) can be + implemented using server-side variables and side effects during + SELECT. + What? Good luck understanding that code in six months.
+
Even interesting joins run into trouble. MySQL's query planner has trouble + with a number of cases that can easily arise in well-normalized data:
- Joining and ordering by rows from multiple tables often forces MySQL to + dump the whole join to a temporary table, then sort it -- awful, + especially if you then use LIMIT BY to paginate the results.
- JOIN clauses with non-trivial conditions, such as joins by range or + joins by similarity, generally cause the planner to revert to table + scans even if the same condition would be indexable outside of a join.
- Joins with WHERE clauses that span both tables, where the rows + selected by the WHERE clause are outliers relative to the table + statistics, often cause MySQL to access tables in suboptimal order.
+
Ok, forget about interesting joins. Even interesting WHERE clauses can run + into trouble: MySQL can't index deterministic functions of a row, either. + While some deterministic functions can be eliminated from the WHERE clause + using simple algebra, many useful cases (whitespace-insensitive comparison, + hash-based comparisons, and so on) can't.
- You can fake these by storing the computed value in the row alongside + the “real” value. This leaves your schema with some ugly data repetition + and a chance for the two to fall out of sync, and clients must use the + “computed” column explicitly.
- Oh, and they must maintain the “computed” version explicitly.
- Or you can use triggers. Ha. See above.
+

And now you know why MySQL advocates are such big fans of doing data +processing in “the client” or “the app.”

Alternate Representations and Derived Tables

Many databases let schema designers and administrators abstract the underlying +“physical” table structure from the presentation given to clients, or to some +specific clients, for any of a number of reasons. MySQL tries to let you do +this, too! And fumbles it quite badly.

As mentioned above, non-trivial views are basically useless. Queries like + SELECT some columns FROM a_view WHERE id = 53 are evaluated in the + stupidest -- and slowest -- possible way. Good luck hiding unusual + partitioning arrangements or a permissions check in a view if you want any + kind of performance.
The poor interactions between triggers and binary logging's default + configuration make it impractical to use triggers to maintain “materialized” + views to avoid the problems with “real” views.
- It also effectively means triggers can't be used to emulate CHECK + constraints and other consistency features.
- Code to maintain materialized views is also finicky and hard to get + “right,” especially if the view includes aggregates or interesting joins + over its source data. I hope you enjoy debugging MySQL's procedural + SQL…
+
For the relatively common case of wanting to abstract partitioned storage + away for clients, MySQL actually has a + tool for it! But + it comes with enough caveats to strangle a + horse:
- It's a separate table engine wrapping a “real” storage engine, which + means it has its own, separate support for engine-specific features: + transactions, foreign keys, and index types, AUTO_INCREMENT, and + others. The syntax for configuring partitions makes selecting the wrong + underlying engine entirely too easy, too.
- Partitioned tables may not be the referrent of foreign keys: you can't + have both enforced relationships and this kind of storage management.
- MySQL doesn't actually know how to store partitions on separate disks or + filesystems. You still need to reach underneath of MySQL do to actual + storage management.
  - Partitioning an InnoDB table under the default InnoDB configuration + stores all of the partitions in the global tablespace file anyways. + Helpful! For per-table configurations, they still all end up + together in the same file. Partitioning InnoDB tables is a waste of + time for managing storage.
  +
- TL,DR: MySQL's partition support is so finicky and limited that + MySQL-based apps tend to opt for multiple MySQL servers (“sharding”) + instead.
+

Hosting Logic In The Database

Yeah, yeah, the usual reaction to stored procedures and in-DB code is “eww, +yuck!” for some not-terrible reasons, but hear me out on two points:

Under the freestanding-database-server paradigm, there will usually be + network latency between database clients and the database itself. There are + two ways to minimize the impact of that: move the data to the code in bulk + to minimize round-trips, or move the code to the data.
Some database administration tasks are better implemented using in-database + code than as freestanding clients: complex data migrations that can't be + expressed as freestanding SQL queries, for example.

MySQL, as of version +5.0 +(released in 2003 -- remember that date, I'll come back to it), has support +for in-database code via a procedural SQL-like dialect, like many other SQL +databases. This includes server-side procedures (blocks of stored code that +are invoked outside of any other statements and return statement-like +results), functions (blocks of stored code that compute a result, used in any +expression context such as a SELECT list or WHERE clause), and triggers +(blocks of stored code that run whenever a row is created, modified, or +deleted).

Given the examples of +other +contemporaneous +procedural +languages, +MySQL's procedural dialect -- an implementation of the +SQL/PSM language -- is quite limited:

There is no language construct for looping over a query result. This seems + like a pretty fundamental feature for a database-hosted language, but no.
There is no language construct for looping while a condition holds. This + seems like a pretty fundamental feature for an imperative language designed + any time after about 1975, but no.
There is no language construct for looping over a range.

There is, in fact, one language construct for looping: the unconditional + loop. All other iteration control is done via conditional LEAVE + statements, as

BEGIN
+    DECLARE c CURSOR FOR
+        SELECT foo, bar, baz
+        FROM some_table
+        WHERE some_condition;
+    DECLARE done INT DEFAULT 0;
+    DECLARE CONTINUE HANDLER FOR NOT FOUND
+        SET done = 1;
+
+    DECLARE c_foo INTEGER;
+    DECLARE c_bar INTEGER;
+    DECLARE c_baz INTEGER;
+
+    OPEN c;
+    process_some_table: LOOP
+        FETCH c INTO c_foo, c_bar, c_baz;
+        IF done THEN
+            LEAVE process_some_table;
+        END IF;
+
+        -- do something with c_foo, c_bar, c_baz
+    END LOOP;
+END;
+

The original “structured programming” revolution in the 1960s seems to +have passed the MySQL team by.

+
Okay, I lied. There are two looping constructs: there's also the REPEAT ... + UNTIL condition END REPEAT construct, analogous to C's do {} while + (!condition); loop. But you still can't loop over query results, and you + can't run zero iterations of the loop's main body this way.
+
There is nothing resembling a modern exception system with automatic scoping + of handlers or declarative exception management. Error handling is entirely + via Visual Basic-style “on condition X, do Y” instructions, which remain in + effect for the rest of the program's execution.
- In the language shipped with MySQL 5.0, there wasn't a way to signal + errors, either: programmers had to resort to stunts like intentionally + issuing failing + queries, + instead. Later versions of the language addressed this with the + SIGNAL + statement: see, + they can learn from better languages, eventually.
+
You can't escape to some other language, since MySQL doesn't have an + extension mechanism for server-side languages or a good way to call + out-of-process services during queries.

The net result is that developing MySQL stored programs is unpleasant, +uncomfortable, and far more error-prone than it could have been.

Why Is MySQL The Way It Is?

MySQL's technology and history contain the seeds of all of these flaws.

Pluggable Storage Engines

Very early in MySQL's life, the MySQL dev team realized that MyISAM was not +the only way to store data, and opted to support other storage backends within +MySQL. This is basically an alright idea; while I personally prefer storage +systems that focus their effort on making one backend work very well, +supporting multiple backends and letting third-party developers write their +own is a pretty good approach too.

Unfortunately, MySQL's storage backend interface puts a very low ceiling on +the ways storage backends can make MySQL behave better.

MySQL's data access paths through table engines are very simple: MySQL asks +the engine to open a table, asks the engine to iterate through the table +returning rows, filters the rows itself (outside of the storage engine), then +asks the engine to close the table. Alternately, MySQL asks the engine to open +a table, asks the engine to retrieve rows in range or for a single value over +a specific index, filters the rows itself, and asks the engine to close the +table.

This simplistic interface frees table engines from having to worry about query +optimization - in theory. Unfortunately, engine-specific features have a large +impact on the performance of various query plans, but the channels back to the +query planner provide very little granularity for estimating cost and prevent +the planner from making good use of the engine in unusual cases. Conversely, +the table engine system is totally isolated from the actual query, and can't +make query-dependent performance choices “on its own.” There's no third path; +the query planner itself is not pluggable.

Similar consequences apply to type checking, support for new types, or even +something as “obvious” as multiple automatic TIMESTAMP columns in the same +table.

Table manipulation -- creation, structural modification, and so on -- runs +into similar problems. MySQL itself parses each CREATE TABLE statement, then +hands off a parsed representation to the table engine so that it can manage +storage. The parsed representation is lossy: there are plenty of forms MySQL's +parser recognizes that aren't representable in a TABLE structure, preventing +engines from implementing, say, column or tuple CHECK constraints without +MySQL's help.

The sheer number of table +engines makes +that help very slow in coming. Any change to the table engine interface means +perturbing the code to each engine, making progress on new MySQL-level +features that interact with storage such as better query planning or new SQL +constructs necessarily slow to implement and slow to test.

Held Back By History

The original MySQL team focused on pure read performance and on “ease of use” +(for new users with simple needs, as far as I can tell) over correctness and +completeness, violating Knuth's laws of optimization. Many of these decisions +locked MySQL into behaviours very early in its life that it still displays +now. Features like implicit type conversions legitimately do help streamline +development in very simple cases; experience with other +languages +unfortunately shows that the same behaviours sandbag development and help hide +bugs in more sophisticated scenarios.

MySQL has since changed hands, and the teams working on MySQL (and MariaDB, +and Percona) are much more mature now than the team that made those early +decisions. MySQL's massive and frequently non-savvy userbase makes it very hard +to introduce breaking changes. At the same time, adding optional breaking +changes via server and client mode flags (such as sql_mode) increases the +cognitive overhead of understanding MySQL's behaviours -- especially when that +behaviour can vary from client to client, or when the server's configuration is +out of the user's control (for example, on a shared host, or on EC2).

A solution similar to Python's from __future__ import pragmas for making +breaking changes opt-in some releases in advance of making them mandatory +might help, but MySQL doesn't have the kind of highly-invested, highly-skilled +user base that would make that effective -- and it still has all of the +problems of modal behaviour.

Bad Arguments

Inevitably, someone's going to come along and tell me how wrong I am and how +MySQL is just fine as a database system. These people are everywhere, and they +mean well too, and they are almost all wrong. There are two good reasons to +use MySQL:

Some earlier group wrote for it, and we haven't finished porting our code + off of MySQL.
We've considered all of these points, and many more, and decided that + ___feature_x___ that MySQL offers is worth the hassle.

Unfortunately, these aren't the reasons people do give, generally. The +following are much more common:

It's good enough. No it ain't. There are plenty of other equally-capable + data storage systems that don't come with MySQL's huge raft of edge cases + and quirks.
- We haven't run into these problems. Actually, a lot of these + problems happen silently. Odds are, unless you write your queries and + schema statements with the manual open and refer back to it constantly, + or have been using MySQL since the 3.x era daily, at least some of + these issues have bitten you. The ones that prevent you from using your + database intelligently are very hard to notice in action.
+
We already know how to use it. MySQL development and administration + causes brain damage, folks, the same way PHP does. Where PHP teaches + programmers that “array” is the only structure you need, MySQL teaches + people that databases are awkward, slow, hard-to-tune monsters that require + constant attention. That doesn't have to be true; there are comfortable, + fast, and easily-tuned systems out there that don't require daily care and + feeding or the love of a specialist.
It's the only thing our host supports. Get a better host. It's + not like they're expensive or hard to find.
- We used it because it was there. Please hire some fucking software + developers and go back to writing elevator pitches and flirting with Y + Combinator.
+
Everybody knows MySQL. It's easy to hire MySQL folks. It's easy to hire + MCSEs, too, but you should be hiring for attitude and ability to learn, not + for specific skillsets, if you want to run a successful software project.
- It's popular. Sure, and nobody ever got fired for buying + IBM/Microsoft/Adobe. Popularity isn't any indication of quality, and if + we let popularity dictate what technology we use and improve we'll never + get anywhere. Marketing software to geeks is easy - it's just that + lots of high-quality projects don't bother.
+
It's lightweight. So's SQLite 3 or + H2. If you care about deployment + footprint more than any other factor, MySQL is actually pretty clunky (and + embedded MySQL has even bigger problems than freestanding MySQL).
It's getting better, so we might as well stay on it. It's + true, if you go + by feature checklists and the manual, MySQL is improving “rapidly.” 5.6 is + due out soon and superficially looks to contain a number of good changes. I + have two problems with this line of reasoning:
1. Why wait? Other databases are good now, not eventually.
2. MySQL has a history of providing the bare minimum to satisfy a feature +checkbox without actually making the feature work well, work consistently, +or work in combination with other features.
+

+ + + +

+ + +comments powered by Disqus +

+ + + + + +