From 76aed6ef732de38d82245b3d674f70bab30221e5 Mon Sep 17 00:00:00 2001 From: Owen Jacobson Date: Fri, 3 Jul 2015 22:31:49 -0400 Subject: Fuck it, serve the files directly. --- .html/mysql/choose-something-else.html | 836 +++++++++++++++++++++++++++++++++ 1 file changed, 836 insertions(+) create mode 100644 .html/mysql/choose-something-else.html (limited to '.html/mysql/choose-something-else.html') diff --git a/.html/mysql/choose-something-else.html b/.html/mysql/choose-something-else.html new file mode 100644 index 0000000..ca3a7b2 --- /dev/null +++ b/.html/mysql/choose-something-else.html @@ -0,0 +1,836 @@ + + + + + The Codex » + Do Not Pass This Way Again + + + + + + + + +
+ + + + + +
+

Do Not Pass This Way Again

+

Considering MySQL? Use something else. Already on MySQL? Migrate. For every +successful project built on MySQL, you could uncover a history of time wasted +mitigating MySQL's inadequacies, masked by a hard-won, but meaningless, sense +of accomplishment over the effort spent making MySQL behave.

+

Thesis: databases fill roles ranging from pure storage to complex and +interesting data processing; MySQL is differently bad at both tasks. Real apps +all fall somewhere between these poles, and suffer variably from both sets of +MySQL flaws.

+ +

Much of this is inspired by the principles behind PHP: A Fractal of Bad +Design. I +suggest reading that article too -- it's got a lot of good thought in it even +if you already know to stay well away from PHP. (If that article offends you, +well, this page probably will too.)

+

Storage

+

Storage systems have four properties:

+
    +
  1. Take and store data they receive from applications.
  2. +
  3. Keep that data safe against loss or accidental change.
  4. +
  5. Provide stored data to applications on demand.
  6. +
  7. Give administrators effective management tools.
  8. +
+

In a truly “pure” storage application, data-comprehension features +(constraints and relationships, nontrivial functions and aggregates) would go +totally unused. There is a time and a place for this: the return of “NoSQL” +storage systems attests to that.

+

Pure storage systems tend to be closely coupled to their “main” application: +consider most web/server app databases. “Secondary” clients tend to be +read-only (reporting applications, monitoring) or to be utilities in service +of the main application (migration tools, documentation tools). If you believe +constraints, validity checks, and other comprehension features can be +implemented in “the application,” you are probably thinking of databases close +to this pole.

+

Storing Data

+

MySQL has many edge cases which reduce the predictability of its behaviour +when storing information. Most of these edge cases are documented, but violate +the principle of least surprise (not to mention the expectations of users +familiar with other SQL implementations).

+
    +
  • Implicit conversions (particularly to and from string types) can modify + MySQL's behaviour.
      +
    • Many implicit conversions are also silent (no warning, no diagnostic), + by design, making it more likely developers are entirely unaware of + them until one does something surprising.
    • +
    +
  • +
  • Conversions that violate basic constraints (range, length) of the output + type often coerce data rather than failing.
      +
    • Sometimes this raises a warning; does your app check for those?
    • +
    • This behaviour is unlike many typed systems (but closely like PHP and + remotely like Perl).
    • +
    +
  • +
  • Conversion behaviour depends on a per-connection configuration value + (sql_mode) that has a large constellation of possible + states, making + it harder to carry expectations from manual testing over to code or from + tool to tool.
  • +
  • MySQL recommends UTF-8 as a character-set, but still defaults to Latin-1. + The implimentation of utf8 up until MySQL 5.5 was only the 3-byte + BMP. + MySQL 5.5 and beyond supports a 4-byte utf8, but confusingly must be set + with the character-set utf8mb4. Implementation details of these encodings + within MySQL, such as the utf8 3-byte limit, tend to leak out into client + applications. Data that does not fit MySQL's understanding of the storage + encoding will be transformed until it does, by truncation or replacement, by + default.
      +
    • Collation support is per-encoding, with one of the stranger default + configurations: by default, the collation orders characters according to + Swedish alphabetization rules, case-insensitively.
    • +
    • Since it's the default, lots of folks who don't know the manual + inside-out and backwards observe MySQL's case-insensitive collation + behaviour ('a' = 'A') and conclude that “MySQL is case-insensitive,” + complicating any effort to use a case-sensitive locale.
    • +
    • Both the encoding and the collation can vary, independently, by + column. Do you keep your schema definition open when you write + queries to watch out for this sort of shit?
    • +
    +
  • +
  • The TIMESTAMP type tries to do something smart by storing values in a + canonical timezone (UTC), but it's done with so few affordances that it's + very hard to even tell that MySQL's done a right thing with your data.
      +
    • And even after that, the result of foo < '2012-04-01 09:00:00' still + depends on what time of year it is when you evaluate the query, unless + you're very careful with your connection timezone.
    • +
    • TIMESTAMP is also special-cased in MySQL's schema definition handling, + making it easy to accidentally create (or to accidentally fail to + create) an auto-updating field when you didn't (did) want one.
    • +
    • DATETIME does not get the same timezone handling TIMESTAMP does. + What? And you can't provide your own without resorting to hacks like + extra columns.
    • +
    • Oh, did you want to use MySQL's timezone support? Too bad, none of + that data's loaded by default. You have to process the OS's tzinfo + files into SQL with a separate tool and import that. If you ever want to + update MySQL's timezone settings later, you need to take the server down + just to make sure the changes apply.
    • +
    +
  • +
+

Preserving Data

+

... against unexpected changes: like most disk-backed storage systems, MySQL +is as reliable as the disks and filesystems its data lives on. MySQL provides +no additional functionality in terms of mirroring or hardware failure tolerance +(such as Oracle ASM). +However this is a limitation shared with many, many other systems.

+

When using the InnoDB storage engine (default since MySQL 5.5), MySQL maintains page +checksums in order to detect corruption caused by underlying storage. However, +many third-party software applications, as sell as users upgrading +from earlier versions of MySQL may be using MyISAM, which will frequently corrupt +data files on improper shutdown.

+

The implicit conversion rules that bite when storing data also bite when +asking MySQL to modify data - my favourite example being a fat-fingered +UPDATE query where a mistyped = (as -, off by a single key) caused 90% +of the rows in the table to be affected, instead of one row, because of +implicit string-to-integer conversions.

+

... against loss: hoo boy. MySQL, out of the box, gives you three approaches +to backups:

+
    +
  • Take “blind” filesystem backups with tar or rsync. Unless you + meticulously lock tables or make the database read-only for the duration, + this produces a backup that requires crash recovery before it will be + usable, and can produce an inconsistent database.
      +
    • This can bite quite hard if you use InnoDB, as InnoDB crash recovery + takes time proportional to both the number of InnoDB tables and the + total size of InnoDB tables, with a large constant.
    • +
    +
  • +
  • Dump to SQL with mysqldump: slow, relatively large backups, and + non-incremental.
  • +
  • Archive binary logs: fragile, complex, over-configurable, and configured + badly by default. (Binary logging is also the basis of MySQL's replication + system.)
  • +
+

If neither of these are sufficient, you're left with purchasing a backup tool +from +Oracle +or from one of the third-party MySQL vendors.

+

Like many of MySQL's features, the binary logging feature is +too +configurable, +while still, somehow, defaulting to modes that are hazardous or surprising: +the +default +behaviour +is to log SQL statements, rather than logging their side effects. This has +lead to numerous bugs over the years; MySQL (now) makes an effort to make +common “non-deterministic” cases such as NOW() and RANDOM() act +deterministically but these have been addressed using ad-hoc solutions. +Restoring binary-log-based backups can easily lead to data that differs from +the original system, and by the time you've noticed the problem, it's too late +to do anything about it.

+

(Seriously. The binary log entries for each statement contain the “current” +time on the master and the random seed at the start of the statement, just in +case. If your non-deterministic query uses any other function, you're still +fucked by +default.)

+

Additionally, a number of apparently-harmless features can lead to backups or +replicas wandering out of sync with the original database, in the default +configuration:

+
    +
  • AUTO_INCREMENT and UPDATE statements.
  • +
  • AUTO_INCREMENT and INSERT statements (sometimes). SURPRISE.
  • +
  • Triggers.
  • +
  • User-defined (native) functions.
  • +
  • Stored (procedural SQL) functions.
  • +
  • DELETE ... LIMIT and UPDATE ... LIMIT statements, though if you use + these, you've misunderstood how SQL is supposed to work.
  • +
  • INSERT ... ON DUPLICATE KEY UPDATE statements.
  • +
  • Bulk-loading data with LOAD DATA statements.
  • +
  • Operations on floating-point + values.
  • +
+

Retrieving Data

+

This mostly works as expected. Most of the ways MySQL will screw you happen +when you store data, not when you retrieve it. However, there are a few things +that implicitly transform stored data before returning it:

+
    +
  • +

    MySQL's surreal type conversion system works the same way during SELECT + that it works during other operations, which can lead to queries matching + unexpected rows:

    +
    owen@scratch> CREATE TABLE account (
    +    ->     accountid INTEGER
    +    ->         AUTO_INCREMENT
    +    ->         PRIMARY KEY,
    +    ->     discountid INTEGER
    +    -> );
    +Query OK, 0 rows affected (0.54 sec)
    +
    +owen@scratch> INSERT INTO account
    +    ->     (discountid)
    +    -> VALUES
    +    ->     (0),
    +    ->     (1),
    +    ->     (2);
    +Query OK, 3 rows affected (0.03 sec)
    +Records: 3  Duplicates: 0  Warnings: 0
    +
    +owen@scratch> SELECT *
    +    -> FROM account
    +    -> WHERE discountid = 'banana';
    ++-----------+------------+
    +| accountid | discountid |
    ++-----------+------------+
    +|         1 |          0 |
    ++-----------+------------+
    +1 row in set, 1 warning (0.05 sec)
    +
    +

    Ok, unexpected, but there's at least a warning (do your apps check for +those?) - let's see what it says:

    +
    owen@scratch> SHOW WARNINGS;
    ++---------+------+--------------------------------------------+
    +| Level   | Code | Message                                    |
    ++---------+------+--------------------------------------------+
    +| Warning | 1292 | Truncated incorrect DOUBLE value: 'banana' |
    ++---------+------+--------------------------------------------+
    +1 row in set (0.03 sec)
    +
    +

    I can count on one hand the number of DOUBLE columns in this example and +still have five fingers left over.

    +

    You might think this is an unreasonable example: maybe you should always +make sure your argument types exactly match the field types, and the query +should use 57 instead of 'banana'. (This does actually “fix” the +problem.) It's unrealistic to expect every single user to run SHOW CREATE +TABLE before every single query, or to memorize the types of every column +in your schema, though. This example derived from a technically-skilled +but MySQL-ignorant tester examining MySQL data to verify some behavioural +changes in an app.

    +
      +
    • +

      Actually, you don't even need a table for this: SELECT 0 = 'banana' + returns 1. Did the PHP folks design + MySQL's = operator?

      +
    • +
    • +

      This isn't affected by sql_mode, even though so many other things are.

      +
    • +
    +
  • +
  • +

    TIMESTAMP columns (and only TIMESTAMP columns) can return + apparently-differing values for the same stored value depending on + per-connection configuration even during read-only operation. This is done + silently and the default behaviour can change as a side effect of non-MySQL + configuration changes in the underlying OS.

    +
  • +
  • String-typed columns are transformed for encoding on output if the + connection is not using the same encoding as the underlying storage, using + the same rules as the transformation on input.
  • +
  • Values that stricter sql_mode settings would reject during storage can + still be returned during retrieval; it is impossible to predict in advance + whether such data exists, since clients are free to set sql_mode to any + value at any time.
  • +
+

Efficiency

+

For purely store-and-retrieve applications, MySQL's query planner (which +transforms the miniature program contained in each SQL statement into a tree +of disk access and data manipulation steps) is sufficient, but only barely. +Queries that retrieve data from one table, or from one table and a small +number of one-to-maybe-one related tables, produce relatively efficient plans.

+

MySQL, however, offers a number of tuning options that can have dramatic and +counterintuitive effects, and the documentation provides very little advice +for choosing settings. Tuning relies on the administrator's personal +experience, blog articles of varying quality, and consultants.

+
    +
  • The MySQL query cache defaults to a non-zero size in some commonly-installed + configurations. However, the larger the cache, the slower writes proceed: + invalidating cache entries that include the tables modified by a query means + considering every entry in the cache. This cache also uses MySQL's LRU + implementation, which has its own performance problems during eviction that + get worse with larger cache sizes.
  • +
  • Memory-management settings, including key_buffer_size and innodb_buffer_pool_size, + have non-linear relationships with performance. The standard + advice advises + making whichever value you care about more to a large value, but this can be + counterproductive if the related data is larger than the pool can hold: + MySQL is once again bad at discarding old buffer pages when the buffer is + exhausted, leading to dramatic slowdowns when query load reaches a certain + point.
      +
    • This also affects filesystem tuning settings such as table_open_cache.
    • +
    +
  • +
  • InnoDB, out of the box, comes configured to use one large (and automatically + growing) tablespace file for all tables, complicating backups and storage + management. This is fine for trivial databases, but MySQL provides no tools + (aside from DROP TABLE and reloading the data from an SQL dump) for + transplanting a table to another tablespace, and provides no tools (aside + from a filesystem-level rm, and reloading all InnoDB data from an SQL + dump) for reclaiming empty space in a tablespace file.
  • +
  • MySQL itself provides very few tools to manage storage; tasks like storing + large or infrequently-accessed tables and databases on dedicated filesystems + must be done on the filesystem, with MySQL shut down.
  • +
+

Data Processing

+

Data processing encompasses tasks that require making decisions about data and +tasks that derive new data from existing data. This is a huge range of topics:

+
    +
  • Deciding (and enforcing) application-specific validity rules.
  • +
  • Summarizing and deriving data.
  • +
  • Providing and maintaining alternate representations and structures.
  • +
  • Hosting complex domain logic near the data it operates on.
  • +
+

The further towards data processing tasks applications move, the more their +SQL resembles tiny programs sent to the data. MySQL is totally unprepared for +programs, and expects SQL to retrieve or modify simple rows.

+

Validity

+

Good constraints are like asserts: in an ideal world, you can't tell if they +work, because your code never violates them. Here in the real world, +constraint violations happen for all sorts of reasons, ranging from buggy code +to buggy human cognition. A good database gives you more places to describe +your expectations and more tools for detecting and preventing surprises. +MySQL, on the other hand, can't validate your data for you, beyond simple (and +fixed) type constraints:

+
    +
  • +

    As with the data you store in it, MySQL feels free to change your table + definitions implicitly and + silently. + Many of these silent schema changes have important performance and + feature-availability implications.

    +
      +
    • +

      Foreign keys are ignored if you spell them certain, common, ways:

      +
      CREATE TABLE foo (
      +    -- ...,
      +    parent INTEGER
      +        NOT NULL
      +        REFERENCES foo_parent (id)
      +    -- , ...
      +)
      +
      +

      silently ignores the foreign key specification, while

      +
      CREATE TABLE foo (
      +    -- ...,
      +    parent INTEGER
      +        NOT NULL,
      +    FOREIGN KEY (parent)
      +        REFERENCES foo_parent (id)
      +    -- , ...
      +)
      +
      +

      preserves it.

      +
    • +
    +
  • +
  • +

    Foreign keys, one of the most widely-used database validity checks, are an + engine-specific feature, restricting their availability in combination with + other engine-specific features. (For example, a table cannot have both + foreign key constraints and full-text indexes, as of MySQL 5.5.)

    +
      +
    • Configurations that violate assumptions about foreign keys, such as a + foreign key pointing into a MyISAM or NDB table, do not cause warnings + or any other diagnostics. The foreign key is simply discarded. SURPRISE. + (MySQL is riddled with these sorts of surprises, and apologists lean + very heavily on the “that's documented” excuse for its bad behaviour.)
    • +
    +
  • +
  • The MySQL parser recognizes CHECK clauses, which allow schema developers + to make complex declarative assertions about tuples in the database, but + discards them without + warning. If you + want CHECK-like constraints, you must implement them as triggers - but see + below...
  • +
  • MySQL's comprehension of the DEFAULT clause is, uh, limited: only + constants are permitted, except for the special + case + of at most one TIMESTAMP column per table and at most one sequence-derived + column. Who designed this mess?
      +
    • Furthermore, there's no way to say “no default” and raise an error when + an INSERT forgets to provide a value. The default DEFAULT is either + NULL or a zero-like constant (0, '', and so on). Even for types + with no meaningful zero-like values (DATETIME).
    • +
    +
  • +
  • MySQL has no mechanism for introducing new types, which might otherwise + provide a route to enforcing validity. Counting the number of special cases + in MySQL's existing type + system illustrates + why that's probably unfixable.
  • +
+

I hope every client with write access to your data is absolutely perfect, +because MySQL cannot help you if you make a mistake.

+

Summarizing and Deriving Data

+

SQL databases generally provide features for doing “interesting” things with +sets of tuples, and MySQL is no exception. However, MySQL's limitations mean +that actually processing data in the database is fraught with wasted money, +brains, and time:

+
    +
  • Aggregate (GROUP BY) queries run up against limits in MySQL's query + planner: a query with both WHERE and GROUP BY clauses can only satisfy + one constraint or the other with indexes, unless there's an index that + covers all the relevant fields in both clauses, in the right order. (What + this order is depends on the complexity of the query and on the distribution + of the underlying data, but that's hardly MySQL-specific.)
      +
    • If you have all three of WHERE, GROUP BY, and ORDER BY in the same + query, you're more or less fucked. Good luck designing a single index + that satisfies all three.
    • +
    +
  • +
  • Even though MySQL allows database administrators to define normal functions + in a procedural SQL + dialect, + custom aggregate + functions + can only be defined by native plugins. Good thing, too, because procedural + SQL in MySQL is its own kind of awful - more on that below.
  • +
  • Subqueries are often convenient and occasionally necessary for expressing + multi-step transformations on some underlying data. MySQL's query planner + has only one strategy for optimizing them: evaluate the innermost query as + written, into an in-memory table, then use a nested loop to satisfy joins or + IN clauses. For large subquery results or interestingly nested subqueries, + this is absurdly slow.
      +
    • MySQL's query planner can't fold constraints from outer queries into + subqueries.
    • +
    • The generated in-memory table never has any indexes, ever, even when + appropriate indexes are “obvious” from the surrounding query; you cannot + even specify them.
    • +
    • These limitations also affect views, which are evaluated as if they were + subqueries. In combination with the lack of constraint folding in the + planner, this makes filtering or aggregating over large views completely + impractical.
    • +
    • MySQL lacks common table + expressions. + Even if subquery efficiency problems get fixed, the inability to give + meaningful names to subqueries makes them hard to read and comprehend.
    • +
    • I hope you like CREATE TEMPORARY TABLE AS SELECT, because that's your + only real alternative.
    • +
    +
  • +
  • Window + functions do not + exist at all in MySQL. This complicates many kinds of analysis, including + time series analyses and ranking analyses. +
  • +
  • Even interesting joins run into trouble. MySQL's query planner has trouble + with a number of cases that can easily arise in well-normalized data:
      +
    • Joining and ordering by rows from multiple tables often forces MySQL to + dump the whole join to a temporary table, then sort it -- awful, + especially if you then use LIMIT BY to paginate the results.
    • +
    • JOIN clauses with non-trivial conditions, such as joins by range or + joins by similarity, generally cause the planner to revert to table + scans even if the same condition would be indexable outside of a join.
    • +
    • Joins with WHERE clauses that span both tables, where the rows + selected by the WHERE clause are outliers relative to the table + statistics, often cause MySQL to access tables in suboptimal order.
    • +
    +
  • +
  • Ok, forget about interesting joins. Even interesting WHERE clauses can run + into trouble: MySQL can't index deterministic functions of a row, either. + While some deterministic functions can be eliminated from the WHERE clause + using simple algebra, many useful cases (whitespace-insensitive comparison, + hash-based comparisons, and so on) can't.
      +
    • You can fake these by storing the computed value in the row alongside + the “real” value. This leaves your schema with some ugly data repetition + and a chance for the two to fall out of sync, and clients must use the + “computed” column explicitly.
    • +
    • Oh, and they must maintain the “computed” version explicitly.
    • +
    • Or you can use triggers. Ha. See above.
    • +
    +
  • +
+

And now you know why MySQL advocates are such big fans of doing data +processing in “the client” or “the app.”

+

Alternate Representations and Derived Tables

+

Many databases let schema designers and administrators abstract the underlying +“physical” table structure from the presentation given to clients, or to some +specific clients, for any of a number of reasons. MySQL tries to let you do +this, too! And fumbles it quite badly.

+
    +
  • As mentioned above, non-trivial views are basically useless. Queries like + SELECT some columns FROM a_view WHERE id = 53 are evaluated in the + stupidest -- and slowest -- possible way. Good luck hiding unusual + partitioning arrangements or a permissions check in a view if you want any + kind of performance.
  • +
  • The poor interactions between triggers and binary logging's default + configuration make it impractical to use triggers to maintain “materialized” + views to avoid the problems with “real” views.
      +
    • It also effectively means triggers can't be used to emulate CHECK + constraints and other consistency features.
    • +
    • Code to maintain materialized views is also finicky and hard to get + “right,” especially if the view includes aggregates or interesting joins + over its source data. I hope you enjoy debugging MySQL's procedural + SQL…
    • +
    +
  • +
  • For the relatively common case of wanting to abstract partitioned storage + away for clients, MySQL actually has a + tool for it! But + it comes with enough caveats to strangle a + horse:
      +
    • It's a separate table engine wrapping a “real” storage engine, which + means it has its own, separate support for engine-specific features: + transactions, foreign keys, and index types, AUTO_INCREMENT, and + others. The syntax for configuring partitions makes selecting the wrong + underlying engine entirely too easy, too.
    • +
    • Partitioned tables may not be the referrent of foreign keys: you can't + have both enforced relationships and this kind of storage management.
    • +
    • MySQL doesn't actually know how to store partitions on separate disks or + filesystems. You still need to reach underneath of MySQL do to actual + storage management.
        +
      • Partitioning an InnoDB table under the default InnoDB configuration + stores all of the partitions in the global tablespace file anyways. + Helpful! For per-table configurations, they still all end up + together in the same file. Partitioning InnoDB tables is a waste of + time for managing storage.
      • +
      +
    • +
    • TL,DR: MySQL's partition support is so finicky and limited that + MySQL-based apps tend to opt for multiple MySQL servers (“sharding”) + instead.
    • +
    +
  • +
+

Hosting Logic In The Database

+

Yeah, yeah, the usual reaction to stored procedures and in-DB code is “eww, +yuck!” for some not-terrible reasons, but hear me out on two points:

+
    +
  • Under the freestanding-database-server paradigm, there will usually be + network latency between database clients and the database itself. There are + two ways to minimize the impact of that: move the data to the code in bulk + to minimize round-trips, or move the code to the data.
  • +
  • Some database administration tasks are better implemented using in-database + code than as freestanding clients: complex data migrations that can't be + expressed as freestanding SQL queries, for example.
  • +
+

MySQL, as of version +5.0 +(released in 2003 -- remember that date, I'll come back to it), has support +for in-database code via a procedural SQL-like dialect, like many other SQL +databases. This includes server-side procedures (blocks of stored code that +are invoked outside of any other statements and return statement-like +results), functions (blocks of stored code that compute a result, used in any +expression context such as a SELECT list or WHERE clause), and triggers +(blocks of stored code that run whenever a row is created, modified, or +deleted).

+

Given the examples of +other +contemporaneous +procedural +languages, +MySQL's procedural dialect -- an implementation of the +SQL/PSM language -- is quite limited:

+
    +
  • There is no language construct for looping over a query result. This seems + like a pretty fundamental feature for a database-hosted language, but no.
  • +
  • There is no language construct for looping while a condition holds. This + seems like a pretty fundamental feature for an imperative language designed + any time after about 1975, but no.
  • +
  • There is no language construct for looping over a range.
  • +
  • +

    There is, in fact, one language construct for looping: the unconditional + loop. All other iteration control is done via conditional LEAVE + statements, as

    +
    BEGIN
    +    DECLARE c CURSOR FOR
    +        SELECT foo, bar, baz
    +        FROM some_table
    +        WHERE some_condition;
    +    DECLARE done INT DEFAULT 0;
    +    DECLARE CONTINUE HANDLER FOR NOT FOUND
    +        SET done = 1;
    +
    +    DECLARE c_foo INTEGER;
    +    DECLARE c_bar INTEGER;
    +    DECLARE c_baz INTEGER;
    +
    +    OPEN c;
    +    process_some_table: LOOP
    +        FETCH c INTO c_foo, c_bar, c_baz;
    +        IF done THEN
    +            LEAVE process_some_table;
    +        END IF;
    +
    +        -- do something with c_foo, c_bar, c_baz
    +    END LOOP;
    +END;
    +
    +

    The original “structured programming” revolution in the 1960s seems to +have passed the MySQL team by.

    +
  • +
  • +

    Okay, I lied. There are two looping constructs: there's also the REPEAT ... + UNTIL condition END REPEAT construct, analogous to C's do {} while + (!condition); loop. But you still can't loop over query results, and you + can't run zero iterations of the loop's main body this way.

    +
  • +
  • There is nothing resembling a modern exception system with automatic scoping + of handlers or declarative exception management. Error handling is entirely + via Visual Basic-style “on condition X, do Y” instructions, which remain in + effect for the rest of the program's execution.
      +
    • In the language shipped with MySQL 5.0, there wasn't a way to signal + errors, either: programmers had to resort to stunts like intentionally + issuing failing + queries, + instead. Later versions of the language addressed this with the + SIGNAL + statement: see, + they can learn from better languages, eventually.
    • +
    +
  • +
  • You can't escape to some other language, since MySQL doesn't have an + extension mechanism for server-side languages or a good way to call + out-of-process services during queries.
  • +
+

The net result is that developing MySQL stored programs is unpleasant, +uncomfortable, and far more error-prone than it could have been.

+

Why Is MySQL The Way It Is?

+

MySQL's technology and history contain the seeds of all of these flaws.

+

Pluggable Storage Engines

+

Very early in MySQL's life, the MySQL dev team realized that MyISAM was not +the only way to store data, and opted to support other storage backends within +MySQL. This is basically an alright idea; while I personally prefer storage +systems that focus their effort on making one backend work very well, +supporting multiple backends and letting third-party developers write their +own is a pretty good approach too.

+

Unfortunately, MySQL's storage backend interface puts a very low ceiling on +the ways storage backends can make MySQL behave better.

+

MySQL's data access paths through table engines are very simple: MySQL asks +the engine to open a table, asks the engine to iterate through the table +returning rows, filters the rows itself (outside of the storage engine), then +asks the engine to close the table. Alternately, MySQL asks the engine to open +a table, asks the engine to retrieve rows in range or for a single value over +a specific index, filters the rows itself, and asks the engine to close the +table.

+

This simplistic interface frees table engines from having to worry about query +optimization - in theory. Unfortunately, engine-specific features have a large +impact on the performance of various query plans, but the channels back to the +query planner provide very little granularity for estimating cost and prevent +the planner from making good use of the engine in unusual cases. Conversely, +the table engine system is totally isolated from the actual query, and can't +make query-dependent performance choices “on its own.” There's no third path; +the query planner itself is not pluggable.

+

Similar consequences apply to type checking, support for new types, or even +something as “obvious” as multiple automatic TIMESTAMP columns in the same +table.

+

Table manipulation -- creation, structural modification, and so on -- runs +into similar problems. MySQL itself parses each CREATE TABLE statement, then +hands off a parsed representation to the table engine so that it can manage +storage. The parsed representation is lossy: there are plenty of forms MySQL's +parser recognizes that aren't representable in a TABLE structure, preventing +engines from implementing, say, column or tuple CHECK constraints without +MySQL's help.

+

The sheer number of table +engines makes +that help very slow in coming. Any change to the table engine interface means +perturbing the code to each engine, making progress on new MySQL-level +features that interact with storage such as better query planning or new SQL +constructs necessarily slow to implement and slow to test.

+

Held Back By History

+

The original MySQL team focused on pure read performance and on “ease of use” +(for new users with simple needs, as far as I can tell) over correctness and +completeness, violating Knuth's laws of optimization. Many of these decisions +locked MySQL into behaviours very early in its life that it still displays +now. Features like implicit type conversions legitimately do help streamline +development in very simple cases; experience with other +languages +unfortunately shows that the same behaviours sandbag development and help hide +bugs in more sophisticated scenarios.

+

MySQL has since changed hands, and the teams working on MySQL (and MariaDB, +and Percona) are much more mature now than the team that made those early +decisions. MySQL's massive and frequently non-savvy userbase makes it very hard +to introduce breaking changes. At the same time, adding optional breaking +changes via server and client mode flags (such as sql_mode) increases the +cognitive overhead of understanding MySQL's behaviours -- especially when that +behaviour can vary from client to client, or when the server's configuration is +out of the user's control (for example, on a shared host, or on EC2).

+

A solution similar to Python's from __future__ import pragmas for making +breaking changes opt-in some releases in advance of making them mandatory +might help, but MySQL doesn't have the kind of highly-invested, highly-skilled +user base that would make that effective -- and it still has all of the +problems of modal behaviour.

+

Bad Arguments

+

Inevitably, someone's going to come along and tell me how wrong I am and how +MySQL is just fine as a database system. These people are everywhere, and they +mean well too, and they are almost all wrong. There are two good reasons to +use MySQL:

+
    +
  1. Some earlier group wrote for it, and we haven't finished porting our code + off of MySQL.
  2. +
  3. We've considered all of these points, and many more, and decided that + ___feature_x___ that MySQL offers is worth the hassle.
  4. +
+

Unfortunately, these aren't the reasons people do give, generally. The +following are much more common:

+
    +
  • It's good enough. No it ain't. There are plenty of other equally-capable + data storage systems that don't come with MySQL's huge raft of edge cases + and quirks.
      +
    • We haven't run into these problems. Actually, a lot of these + problems happen silently. Odds are, unless you write your queries and + schema statements with the manual open and refer back to it constantly, + or have been using MySQL since the 3.x era daily, at least some of + these issues have bitten you. The ones that prevent you from using your + database intelligently are very hard to notice in action.
    • +
    +
  • +
  • We already know how to use it. MySQL development and administration + causes brain damage, folks, the same way PHP does. Where PHP teaches + programmers that “array” is the only structure you need, MySQL teaches + people that databases are awkward, slow, hard-to-tune monsters that require + constant attention. That doesn't have to be true; there are comfortable, + fast, and easily-tuned systems out there that don't require daily care and + feeding or the love of a specialist.
  • +
  • It's the only thing our host supports. Get a better host. It's + not like they're expensive or hard to find.
      +
    • We used it because it was there. Please hire some fucking software + developers and go back to writing elevator pitches and flirting with Y + Combinator.
    • +
    +
  • +
  • Everybody knows MySQL. It's easy to hire MySQL folks. It's easy to hire + MCSEs, too, but you should be hiring for attitude and ability to learn, not + for specific skillsets, if you want to run a successful software project.
      +
    • It's popular. Sure, and nobody ever got fired for buying + IBM/Microsoft/Adobe. Popularity isn't any indication of quality, and if + we let popularity dictate what technology we use and improve we'll never + get anywhere. Marketing software to geeks is easy - it's just that + lots of high-quality projects don't bother.
    • +
    +
  • +
  • It's lightweight. So's SQLite 3 or + H2. If you care about deployment + footprint more than any other factor, MySQL is actually pretty clunky (and + embedded MySQL has even bigger problems than freestanding MySQL).
  • +
  • It's getting better, so we might as well stay on it. It's + true, if you go + by feature checklists and the manual, MySQL is improving “rapidly.” 5.6 is + due out soon and superficially looks to contain a number of good changes. I + have two problems with this line of reasoning:
      +
    1. Why wait? Other databases are good now, not eventually.
    2. +
    3. MySQL has a history of providing the bare minimum to satisfy a feature +checkbox without actually making the feature work well, work consistently, +or work in combination with other features.
    4. +
    +
  • +
+
+ + + +
+
+ + +comments powered by Disqus +
+ + + + + +
+ + \ No newline at end of file -- cgit v1.2.3