summaryrefslogtreecommitdiff
diff options
context:
space:
mode:
authorAlice Bevan-McGregor <alice@gothcandy.com>2020-04-14 18:06:43 -0400
committerAlice Bevan-McGregor <alice@gothcandy.com>2020-04-14 18:06:43 -0400
commit21f542a157971f90be1dcbadaad9eaa265ba0a62 (patch)
treeb7a8da50e17bc780ed9242d89a11189c7f5e9452
parent34708dfa902afabf4833c25233132e56514915de (diff)
Restore "mysql/choose-something-else" article.
Restore of the file dropped from 0d6f58c54a7af6c8b4e6cd98663eb36ec4e3accc to an assumed now-correct location to restore accessibility at the original URI. Additionally HTTPS'd many of the links, still requires verification pass for dead links.
-rw-r--r--docs/mysql/choose-something-else.md736
1 files changed, 736 insertions, 0 deletions
diff --git a/docs/mysql/choose-something-else.md b/docs/mysql/choose-something-else.md
new file mode 100644
index 0000000..738950f
--- /dev/null
+++ b/docs/mysql/choose-something-else.md
@@ -0,0 +1,736 @@
+# Do Not Pass This Way Again
+
+Considering MySQL? Use something else. Already on MySQL? Migrate. For every
+successful project built on MySQL, you could uncover a history of time wasted
+mitigating MySQL's inadequacies, masked by a hard-won, but meaningless, sense
+of accomplishment over the effort spent making MySQL behave.
+
+Thesis: databases fill roles ranging from pure storage to complex and
+interesting data processing; MySQL is differently bad at both tasks. Real apps
+all fall somewhere between these poles, and suffer variably from both sets of
+MySQL flaws.
+
+* MySQL is bad at [storage](#storage).
+* MySQL is bad at [data processing](#data-processing).
+* MySQL is bad [by design](#by-design).
+* [Bad arguments](#bad-arguments) for using MySQL.
+
+Much of this is inspired by the principles behind [PHP: A Fractal of Bad
+Design](http://me.veekun.com/blog/2012/04/09/php-a-fractal-of-bad-design/). I
+suggest reading that article too -- it's got a lot of good thought in it even
+if you already know to stay well away from PHP. (If that article offends you,
+well, this page probably will too.)
+
+## Storage
+
+Storage systems have four properties:
+
+1. Take and store data they receive from applications.
+2. Keep that data safe against loss or accidental change.
+3. Provide stored data to applications on demand.
+4. Give administrators effective management tools.
+
+In a truly “pure” storage application, data-comprehension features
+(constraints and relationships, nontrivial functions and aggregates) would go
+totally unused. There is a time and a place for this: the return of “NoSQL”
+storage systems attests to that.
+
+Pure storage systems tend to be closely coupled to their “main” application:
+consider most web/server app databases. “Secondary” clients tend to be
+read-only (reporting applications, monitoring) or to be utilities in service
+of the main application (migration tools, documentation tools). If you believe
+constraints, validity checks, and other comprehension features can be
+implemented in “the application,” you are probably thinking of databases close
+to this pole.
+
+### Storing Data
+
+MySQL has many edge cases which reduce the predictability of its behaviour
+when storing information. Most of these edge cases are documented, but violate
+the principle of least surprise (not to mention the expectations of users
+familiar with other SQL implementations).
+
+* Implicit conversions (particularly to and from string types) can modify
+ MySQL's behaviour.
+ * Many implicit conversions are also silent (no warning, no diagnostic),
+ by design, making it more likely developers are entirely unaware of
+ them until one does something surprising.
+* Conversions that violate basic constraints (range, length) of the output
+ type often coerce data rather than failing.
+ * Sometimes this raises a warning; does your app check for those?
+ * This behaviour is unlike many typed systems (but closely like PHP and
+ remotely like Perl).
+* Conversion behaviour depends on a per-connection configuration value
+ (`sql_mode`) that has [a large constellation of possible
+ states](https://dev.mysql.com/doc/refman/5.5/en/sql-mode.html), making
+ it harder to carry expectations from manual testing over to code or from
+ tool to tool.
+* MySQL recommends UTF-8 as a character-set, but still defaults to Latin-1.
+ The implimentation of `utf8` up until MySQL 5.5 was only the 3-byte
+ [BMP](https://en.wikipedia.org/wiki/Basic_Multilingual_Plane#Basic_Multilingual_Plane).
+ MySQL 5.5 and beyond supports a 4-byte `utf8`, but confusingly must be set
+ with the character-set `utf8mb4`. Implementation details of these encodings
+ within MySQL, such as the `utf8` 3-byte limit, tend to leak out into client
+ applications. Data that does not fit MySQL's understanding of the storage
+ encoding will be transformed until it does, by truncation or replacement, by
+ default.
+ * Collation support is per-encoding, with one of the stranger default
+ configurations: by default, the collation orders characters according to
+ Swedish alphabetization rules, case-insensitively.
+ * Since it's the default, lots of folks who don't know the manual
+ inside-out and backwards observe MySQL's case-insensitive collation
+ behaviour (`'a' = 'A'`) and conclude that “MySQL is case-insensitive,”
+ complicating any effort to use a case-sensitive locale.
+ * Both the encoding and the collation can vary, independently, by
+ _column_. Do you keep your schema definition open when you write
+ queries to watch out for this sort of shit?
+* The `TIMESTAMP` type tries to do something smart by storing values in a
+ canonical timezone (UTC), but it's done with so few affordances that it's
+ very hard to even _tell_ that MySQL's done a right thing with your data.
+ * And even after that, the result of `foo < '2012-04-01 09:00:00'` still
+ depends on what time of year it is when you evaluate the query, unless
+ you're very careful with your connection timezone.
+ * `TIMESTAMP` is also special-cased in MySQL's schema definition handling,
+ making it easy to accidentally create (or to accidentally fail to
+ create) an auto-updating field when you didn't (did) want one.
+ * `DATETIME` does not get the same timezone handling `TIMESTAMP` does.
+ What? And you can't provide your own without resorting to hacks like
+ extra columns.
+ * Oh, did you want to _use_ MySQL's timezone support? Too bad, none of
+ that data's loaded by default. You have to process the OS's `tzinfo`
+ files into SQL with a separate tool and import that. If you ever want to
+ update MySQL's timezone settings later, you need to take the server down
+ just to make sure the changes apply.
+
+### Preserving Data
+
+... against unexpected changes: like most disk-backed storage systems, MySQL
+is as reliable as the disks and filesystems its data lives on. MySQL provides
+no additional functionality in terms of mirroring or hardware failure tolerance
+(such as [Oracle ASM](https://en.wikipedia.org/wiki/Automatic_Storage_Management)).
+However this is a limitation shared with many, _many_ other systems.
+
+When using the InnoDB storage engine (default since MySQL 5.5), MySQL maintains page
+checksums in order to detect corruption caused by underlying storage. However,
+many third-party software applications, as sell as users upgrading
+from earlier versions of MySQL may be using MyISAM, which will frequently corrupt
+data files on improper shutdown.
+
+The implicit conversion rules that bite when storing data also bite when
+asking MySQL to modify data - my favourite example being a fat-fingered
+`UPDATE` query where a mistyped `=` (as `-`, off by a single key) caused 90%
+of the rows in the table to be affected, instead of one row, because of
+implicit string-to-integer conversions.
+
+... against loss: hoo boy. MySQL, out of the box, gives you three approaches
+to [backups](https://dev.mysql.com/doc/refman/5.5/en/backup-methods.html):
+
+* Take “blind” filesystem backups with `tar` or `rsync`. Unless you
+ meticulously lock tables or make the database read-only for the duration,
+ this produces a backup that requires crash recovery before it will be
+ usable, and can produce an inconsistent database.
+ * This can bite quite hard if you use InnoDB, as InnoDB crash recovery
+ takes time proportional to both the number of InnoDB tables and the
+ total size of InnoDB tables, with a large constant.
+* Dump to SQL with `mysqldump`: slow, relatively large backups, and
+ non-incremental.
+* Archive binary logs: fragile, complex, over-configurable, and configured
+ badly by default. (Binary logging is also the basis of MySQL's replication
+ system.)
+
+If neither of these are sufficient, you're left with purchasing [a backup tool
+from
+Oracle](https://dev.mysql.com/doc/refman/5.5/en/glossary.html#glos_mysql_enterprise_backup)
+or from one of the third-party MySQL vendors.
+
+Like many of MySQL's features, the binary logging feature is
+[too](https://dev.mysql.com/doc/refman/5.5/en/binary-log.html)
+[configurable](https://dev.mysql.com/doc/refman/5.5/en/replication-options-binary-log.html),
+while still, somehow, defaulting to modes that are hazardous or surprising:
+the
+[default](https://dev.mysql.com/doc/refman/5.5/en/replication-options-binary-log.html#sysvar_binlog_format)
+[behaviour](https://dev.mysql.com/doc/refman/5.5/en/replication-formats.html)
+is to log SQL statements, rather than logging their side effects. This has
+lead to numerous bugs over the years; MySQL (now) makes an effort to make
+common “non-deterministic” cases such as `NOW()` and `RANDOM()` act
+deterministically but these have been addressed using ad-hoc solutions.
+Restoring binary-log-based backups can easily lead to data that differs from
+the original system, and by the time you've noticed the problem, it's too late
+to do anything about it.
+
+(Seriously. The binary log entries for each statement contain the “current”
+time on the master and the random seed at the start of the statement, just in
+case. If your non-deterministic query uses any other function, you're still
+[fucked by
+default](https://dev.mysql.com/doc/refman/5.5/en/replication-sbr-rbr.html#replication-sbr-rbr-sbr-disadvantages).)
+
+Additionally, a number of apparently-harmless features can lead to backups or
+replicas wandering out of sync with the original database, in the default
+configuration:
+
+* `AUTO_INCREMENT` and `UPDATE` statements.
+* `AUTO_INCREMENT` and `INSERT` statements (sometimes). SURPRISE.
+* Triggers.
+* User-defined (native) functions.
+* Stored (procedural SQL) functions.
+* `DELETE ... LIMIT` and `UPDATE ... LIMIT` statements, though if you use
+ these, you've misunderstood how SQL is supposed to work.
+* `INSERT ... ON DUPLICATE KEY UPDATE` statements.
+* Bulk-loading data with `LOAD DATA` statements.
+* [Operations on floating-point
+ values](https://dev.mysql.com/doc/refman/5.5/en/replication-features-floatvalues.html).
+
+### Retrieving Data
+
+This mostly works as expected. Most of the ways MySQL will screw you happen
+when you store data, not when you retrieve it. However, there are a few things
+that implicitly transform stored data before returning it:
+
+* MySQL's surreal type conversion system works the same way during `SELECT`
+ that it works during other operations, which can lead to queries matching
+ unexpected rows:
+
+ owen@scratch> CREATE TABLE account (
+ -> accountid INTEGER
+ -> AUTO_INCREMENT
+ -> PRIMARY KEY,
+ -> discountid INTEGER
+ -> );
+ Query OK, 0 rows affected (0.54 sec)
+
+ owen@scratch> INSERT INTO account
+ -> (discountid)
+ -> VALUES
+ -> (0),
+ -> (1),
+ -> (2);
+ Query OK, 3 rows affected (0.03 sec)
+ Records: 3 Duplicates: 0 Warnings: 0
+
+ owen@scratch> SELECT *
+ -> FROM account
+ -> WHERE discountid = 'banana';
+ +-----------+------------+
+ | accountid | discountid |
+ +-----------+------------+
+ | 1 | 0 |
+ +-----------+------------+
+ 1 row in set, 1 warning (0.05 sec)
+
+ Ok, unexpected, but there's at least a warning (do your apps check for
+ those?) - let's see what it says:
+
+ owen@scratch> SHOW WARNINGS;
+ +---------+------+--------------------------------------------+
+ | Level | Code | Message |
+ +---------+------+--------------------------------------------+
+ | Warning | 1292 | Truncated incorrect DOUBLE value: 'banana' |
+ +---------+------+--------------------------------------------+
+ 1 row in set (0.03 sec)
+
+ I can count on one hand the number of `DOUBLE` columns in this example and
+ still have five fingers left over.
+
+ You might think this is an unreasonable example: maybe you should always
+ make sure your argument types exactly match the field types, and the query
+ should use `57` instead of `'banana'`. (This does actually “fix” the
+ problem.) It's unrealistic to expect every single user to run `SHOW CREATE
+ TABLE` before every single query, or to memorize the types of every column
+ in your schema, though. This example derived from a technically-skilled
+ but MySQL-ignorant tester examining MySQL data to verify some behavioural
+ changes in an app.
+
+ * Actually, you don't even need a table for this: `SELECT 0 = 'banana'`
+ returns `1`. Did the [PHP](http://phpsadness.com/sad/52) folks design
+ MySQL's `=` operator?
+
+ * This isn't affected by `sql_mode`, even though so many other things are.
+
+* `TIMESTAMP` columns (and _only_ `TIMESTAMP` columns) can return
+ apparently-differing values for the same stored value depending on
+ per-connection configuration even during read-only operation. This is done
+ silently and the default behaviour can change as a side effect of non-MySQL
+ configuration changes in the underlying OS.
+* String-typed columns are transformed for encoding on output if the
+ connection is not using the same encoding as the underlying storage, using
+ the same rules as the transformation on input.
+* Values that stricter `sql_mode` settings would reject during storage can
+ still be returned during retrieval; it is impossible to predict in advance
+ whether such data exists, since clients are free to set `sql_mode` to any
+ value at any time.
+
+### Efficiency
+
+For purely store-and-retrieve applications, MySQL's query planner (which
+transforms the miniature program contained in each SQL statement into a tree
+of disk access and data manipulation steps) is sufficient, but only barely.
+Queries that retrieve data from one table, or from one table and a small
+number of one-to-maybe-one related tables, produce relatively efficient plans.
+
+MySQL, however, offers a number of tuning options that can have dramatic and
+counterintuitive effects, and the documentation provides very little advice
+for choosing settings. Tuning relies on the administrator's personal
+experience, blog articles of varying quality, and consultants.
+
+* The MySQL query cache defaults to a non-zero size in some commonly-installed
+ configurations. However, the larger the cache, the slower writes proceed:
+ invalidating cache entries that include the tables modified by a query means
+ considering every entry in the cache. This cache also uses MySQL's LRU
+ implementation, which has its own performance problems during eviction that
+ get worse with larger cache sizes.
+* Memory-management settings, including `key_buffer_size` and `innodb_buffer_pool_size`,
+ have non-linear relationships with performance. The [standard](http://www.mysqlperformanceblog.com/2006/09/29/what-to-tune-in-mysql-server-after-installation/)
+ [advice](http://www.mysqlperformanceblog.com/2007/11/01/innodb-performance-optimization-basics/) advises
+ making whichever value you care about more to a large value, but this can be
+ counterproductive if the related data is larger than the pool can hold:
+ MySQL is once again bad at discarding old buffer pages when the buffer is
+ exhausted, leading to dramatic slowdowns when query load reaches a certain
+ point.
+ * This also affects filesystem tuning settings such as `table_open_cache`.
+* InnoDB, out of the box, comes configured to use one large (and automatically
+ growing) tablespace file for all tables, complicating backups and storage
+ management. This is fine for trivial databases, but MySQL provides no tools
+ (aside from `DROP TABLE` and reloading the data from an SQL dump) for
+ transplanting a table to another tablespace, and provides no tools (aside
+ from a filesystem-level `rm`, and reloading _all_ InnoDB data from an SQL
+ dump) for reclaiming empty space in a tablespace file.
+* MySQL itself provides very few tools to manage storage; tasks like storing
+ large or infrequently-accessed tables and databases on dedicated filesystems
+ must be done on the filesystem, with MySQL shut down.
+
+## Data Processing
+
+Data processing encompasses tasks that require making decisions about data and
+tasks that derive new data from existing data. This is a huge range of topics:
+
+* Deciding (and enforcing) application-specific validity rules.
+* Summarizing and deriving data.
+* Providing and maintaining alternate representations and structures.
+* Hosting complex domain logic near the data it operates on.
+
+The further towards data processing tasks applications move, the more their
+SQL resembles tiny programs sent to the data. MySQL is totally unprepared for
+programs, and expects SQL to retrieve or modify simple rows.
+
+### Validity
+
+Good constraints are like `assert`s: in an ideal world, you can't tell if they
+work, because your code never violates them. Here in the real world,
+constraint violations happen for all sorts of reasons, ranging from buggy code
+to buggy human cognition. A good database gives you more places to describe
+your expectations and more tools for detecting and preventing surprises.
+MySQL, on the other hand, can't validate your data for you, beyond simple (and
+fixed) type constraints:
+
+* As with the data you store in it, MySQL feels free to change your table
+ definitions [implicitly and
+ silently](https://dev.mysql.com/doc/refman/5.5/en/silent-column-changes.html).
+ Many of these silent schema changes have important performance and
+ feature-availability implications.
+ * Foreign keys are ignored if you spell them certain, common, ways:
+
+ CREATE TABLE foo (
+ -- ...,
+ parent INTEGER
+ NOT NULL
+ REFERENCES foo_parent (id)
+ -- , ...
+ )
+
+ silently ignores the foreign key specification, while
+
+ CREATE TABLE foo (
+ -- ...,
+ parent INTEGER
+ NOT NULL,
+ FOREIGN KEY (parent)
+ REFERENCES foo_parent (id)
+ -- , ...
+ )
+
+ preserves it.
+
+* Foreign keys, one of the most widely-used database validity checks, are an
+ engine-specific feature, restricting their availability in combination with
+ other engine-specific features. (For example, a table cannot have both
+ foreign key constraints and full-text indexes, as of MySQL 5.5.)
+ * Configurations that violate assumptions about foreign keys, such as a
+ foreign key pointing into a MyISAM or NDB table, do not cause warnings
+ or any other diagnostics. The foreign key is simply discarded. SURPRISE.
+ (MySQL is riddled with these sorts of surprises, and apologists lean
+ very heavily on the “that's documented” excuse for its bad behaviour.)
+* The MySQL parser recognizes `CHECK` clauses, which allow schema developers
+ to make complex declarative assertions about tuples in the database, but
+ [discards them without
+ warning](https://dev.mysql.com/doc/refman/5.5/en/create-table.html). If you
+ want `CHECK`-like constraints, you must implement them as triggers - but see
+ below...
+* MySQL's comprehension of the `DEFAULT` clause is, uh, limited: only
+ constants are permitted, except for the [special
+ case](https://dev.mysql.com/doc/refman/5.5/en/timestamp-initialization.html)
+ of at most one `TIMESTAMP` column per table and at most one sequence-derived
+ column. Who designed this mess?
+ * Furthermore, there's no way to say “no default” and raise an error when
+ an INSERT forgets to provide a value. The default `DEFAULT` is either
+ `NULL` or a zero-like constant (`0`, `''`, and so on). Even for types
+ with no meaningful zero-like values (`DATETIME`).
+* MySQL has no mechanism for introducing new types, which might otherwise
+ provide a route to enforcing validity. Counting the number of special cases
+ in MySQL's [existing type
+ system](https://dev.mysql.com/doc/refman/5.5/en/data-types.html) illustrates
+ why that's probably unfixable.
+
+I hope every client with write access to your data is absolutely perfect,
+because MySQL _cannot help you_ if you make a mistake.
+
+### Summarizing and Deriving Data
+
+SQL databases generally provide features for doing “interesting” things with
+sets of tuples, and MySQL is no exception. However, MySQL's limitations mean
+that actually processing data in the database is fraught with wasted money,
+brains, and time:
+
+* Aggregate (`GROUP BY`) queries run up against limits in MySQL's query
+ planner: a query with both `WHERE` and `GROUP BY` clauses can only satisfy
+ one constraint or the other with indexes, unless there's an index that
+ covers all the relevant fields in both clauses, in the right order. (What
+ this order is depends on the complexity of the query and on the distribution
+ of the underlying data, but that's hardly MySQL-specific.)
+ * If you have all three of `WHERE`, `GROUP BY`, and `ORDER BY` in the same
+ query, you're more or less fucked. Good luck designing a single index
+ that satisfies all three.
+* Even though MySQL allows database administrators to [define normal functions
+ in a procedural SQL
+ dialect](https://dev.mysql.com/doc/refman/5.5/en/create-procedure.html),
+ [custom aggregate
+ functions](https://dev.mysql.com/doc/refman/5.5/en/create-function-udf.html)
+ can only be defined by native plugins. Good thing, too, because procedural
+ SQL in MySQL is its own kind of awful - more on that below.
+* Subqueries are often convenient and occasionally necessary for expressing
+ multi-step transformations on some underlying data. MySQL's query planner
+ has only one strategy for optimizing them: evaluate the innermost query as
+ written, into an in-memory table, then use a nested loop to satisfy joins or
+ `IN` clauses. For large subquery results or interestingly nested subqueries,
+ this is absurdly slow.
+ * MySQL's query planner can't fold constraints from outer queries into
+ subqueries.
+ * The generated in-memory table never has any indexes, ever, even when
+ appropriate indexes are “obvious” from the surrounding query; you cannot
+ even specify them.
+ * These limitations also affect views, which are evaluated as if they were
+ subqueries. In combination with the lack of constraint folding in the
+ planner, this makes filtering or aggregating over large views completely
+ impractical.
+ * MySQL lacks [common table
+ expressions](https://www.postgresql.org/docs/9.2/static/queries-with.html).
+ Even if subquery efficiency problems get fixed, the inability to give
+ meaningful names to subqueries makes them hard to read and comprehend.
+ * I hope you like `CREATE TEMPORARY TABLE AS SELECT`, because that's your
+ only real alternative.
+* [Window
+ functions](https://en.wikipedia.org/wiki/Select_(SQL)#Window_function) do not
+ exist at all in MySQL. This complicates many kinds of analysis, including
+ time series analyses and ranking analyses.
+ * Specific cases (for example, assigning rank numbers to rows) can be
+ implemented using [server-side variables and side effects during
+ `SELECT`](https://stackoverflow.com/questions/6473800/assigning-row-rank-numbers).
+ What? Good luck understanding that code in six months.
+* Even interesting joins run into trouble. MySQL's query planner has trouble
+ with a number of cases that can easily arise in well-normalized data:
+ * Joining and ordering by rows from multiple tables often forces MySQL to
+ dump the whole join to a temporary table, then sort it -- awful,
+ especially if you then use `LIMIT BY` to paginate the results.
+ * `JOIN` clauses with non-trivial conditions, such as joins by range or
+ joins by similarity, generally cause the planner to revert to table
+ scans even if the same condition would be indexable outside of a join.
+ * Joins with `WHERE` clauses that span both tables, where the rows
+ selected by the `WHERE` clause are outliers relative to the table
+ statistics, often cause MySQL to access tables in suboptimal order.
+* Ok, forget about interesting joins. Even interesting `WHERE` clauses can run
+ into trouble: MySQL can't index deterministic functions of a row, either.
+ While some deterministic functions can be eliminated from the `WHERE` clause
+ using simple algebra, many useful cases (whitespace-insensitive comparison,
+ hash-based comparisons, and so on) can't.
+ * You can fake these by storing the computed value in the row alongside
+ the “real” value. This leaves your schema with some ugly data repetition
+ and a chance for the two to fall out of sync, and clients must use the
+ “computed” column explicitly.
+ * Oh, and they must maintain the “computed” version explicitly.
+ * Or you can use triggers. Ha. See above.
+
+And now you know why MySQL advocates are such big fans of doing data
+_processing_ in “the client” or “the app.”
+
+### Alternate Representations and Derived Tables
+
+Many databases let schema designers and administrators abstract the underlying
+“physical” table structure from the presentation given to clients, or to some
+specific clients, for any of a number of reasons. MySQL tries to let you do
+this, too! And fumbles it quite badly.
+
+* As mentioned above, non-trivial views are basically useless. Queries like
+ `SELECT some columns FROM a_view WHERE id = 53` are evaluated in the
+ stupidest -- and slowest -- possible way. Good luck hiding unusual
+ partitioning arrangements or a permissions check in a view if you want any
+ kind of performance.
+* The poor interactions between triggers and binary logging's default
+ configuration make it impractical to use triggers to maintain “materialized”
+ views to avoid the problems with “real” views.
+ * It also effectively means triggers can't be used to emulate `CHECK`
+ constraints and other consistency features.
+ * Code to maintain materialized views is also finicky and hard to get
+ “right,” especially if the view includes aggregates or interesting joins
+ over its source data. I hope you enjoy debugging MySQL's procedural
+ SQL…
+* For the relatively common case of wanting to abstract partitioned storage
+ away for clients, MySQL actually has [a
+ tool](https://dev.mysql.com/doc/refman/5.5/en/partitioning.html) for it! But
+ it comes with [enough caveats to strangle a
+ horse](https://dev.mysql.com/doc/refman/5.5/en/partitioning-limitations.html):
+ * It's a separate table engine wrapping a “real” storage engine, which
+ means it has its own, separate support for engine-specific features:
+ transactions, foreign keys, and index types, `AUTO_INCREMENT`, and
+ others. The syntax for configuring partitions makes selecting the wrong
+ underlying engine entirely too easy, too.
+ * Partitioned tables may not be the referrent of foreign keys: you can't
+ have both enforced relationships and this kind of storage management.
+ * MySQL doesn't actually know how to store partitions on separate disks or
+ filesystems. You still need to reach underneath of MySQL do to actual
+ storage management.
+ * Partitioning an InnoDB table under the default InnoDB configuration
+ stores all of the partitions in the global tablespace file anyways.
+ Helpful! For per-table configurations, they still all end up
+ together in the same file. Partitioning InnoDB tables is a waste of
+ time for managing storage.
+ * TL,DR: MySQL's partition support is so finicky and limited that
+ MySQL-based apps tend to opt for multiple MySQL servers (“sharding”)
+ instead.
+
+### Hosting Logic In The Database
+
+Yeah, yeah, the usual reaction to stored procedures and in-DB code is “eww,
+yuck!” for some not-terrible reasons, but hear me out on two points:
+
+* Under the freestanding-database-server paradigm, there will usually be
+ network latency between database clients and the database itself. There are
+ two ways to minimize the impact of that: move the data to the code in bulk
+ to minimize round-trips, or move the code to the data.
+* Some database administration tasks are better implemented using in-database
+ code than as freestanding clients: complex data migrations that can't be
+ expressed as freestanding SQL queries, for example.
+
+MySQL, as of version
+[5.0](https://dev.mysql.com/doc/relnotes/mysql/5.0/en/news-5-0-0.html)
+(released in 2003 -- remember that date, I'll come back to it), has support
+for in-database code via a procedural SQL-like dialect, like many other SQL
+databases. This includes server-side procedures (blocks of stored code that
+are invoked outside of any other statements and return statement-like
+results), functions (blocks of stored code that compute a result, used in any
+expression context such as a `SELECT` list or `WHERE` clause), and triggers
+(blocks of stored code that run whenever a row is created, modified, or
+deleted).
+
+Given the examples of
+[other](https://www.postgresql.org/docs/7.3/static/plpgsql.html)
+[contemporaneous](https://msdn.microsoft.com/en-US/library/ms189826(v=sql.90).aspx)
+[procedural](https://docs.oracle.com/cd/B10501_01/appdev.920/a96624/toc.htm)
+[languages](https://www.firebirdsql.org/file/documentation/reference_manuals/reference_material/html/langrefupd15-psql.html),
+MySQL's procedural dialect -- an implementation of the
+[SQL/PSM](https://en.wikipedia.org/wiki/SQL/PSM) language -- is quite limited:
+
+* There is no language construct for looping over a query result. This seems
+ like a pretty fundamental feature for a database-hosted language, but no.
+* There is no language construct for looping while a condition holds. This
+ seems like a pretty fundamental feature for an imperative language designed
+ any time after about 1975, but no.
+* There is no language construct for looping over a range.
+* There is, in fact, one language construct for looping: the unconditional
+ loop. All other iteration control is done via conditional `LEAVE`
+ statements, as
+
+ BEGIN
+ DECLARE c CURSOR FOR
+ SELECT foo, bar, baz
+ FROM some_table
+ WHERE some_condition;
+ DECLARE done INT DEFAULT 0;
+ DECLARE CONTINUE HANDLER FOR NOT FOUND
+ SET done = 1;
+
+ DECLARE c_foo INTEGER;
+ DECLARE c_bar INTEGER;
+ DECLARE c_baz INTEGER;
+
+ OPEN c;
+ process_some_table: LOOP
+ FETCH c INTO c_foo, c_bar, c_baz;
+ IF done THEN
+ LEAVE process_some_table;
+ END IF;
+
+ -- do something with c_foo, c_bar, c_baz
+ END LOOP;
+ END;
+
+ The original “structured programming” revolution in the 1960s seems to
+ have passed the MySQL team by.
+
+* Okay, I lied. There are two looping constructs: there's also the `REPEAT ...
+ UNTIL condition END REPEAT` construct, analogous to C's `do {} while
+ (!condition);` loop. But you still can't loop over query results, and you
+ can't run zero iterations of the loop's main body this way.
+* There is nothing resembling a modern exception system with automatic scoping
+ of handlers or declarative exception management. Error handling is entirely
+ via Visual Basic-style “on condition X, do Y” instructions, which remain in
+ effect for the rest of the program's execution.
+ * In the language shipped with MySQL 5.0, there wasn't a way to signal
+ errors, either: programmers had to resort to stunts like [intentionally
+ issuing failing
+ queries](https://stackoverflow.com/questions/465727/raise-error-within-mysql-function),
+ instead. Later versions of the language addressed this with the
+ [`SIGNAL`
+ statement](https://dev.mysql.com/doc/refman/5.5/en/signal.html): see,
+ they _can_ learn from better languages, eventually.
+* You can't escape to some other language, since MySQL doesn't have an
+ extension mechanism for server-side languages or a good way to call
+ out-of-process services during queries.
+
+The net result is that developing MySQL stored programs is unpleasant,
+uncomfortable, and far more error-prone than it could have been.
+
+## Why Is MySQL The Way It Is? { #by-design }
+
+MySQL's technology and history contain the seeds of all of these flaws.
+
+### Pluggable Storage Engines
+
+Very early in MySQL's life, the MySQL dev team realized that MyISAM was not
+the only way to store data, and opted to support other storage backends within
+MySQL. This is basically an alright idea; while I personally prefer storage
+systems that focus their effort on making one backend work very well,
+supporting multiple backends and letting third-party developers write their
+own is a pretty good approach too.
+
+Unfortunately, MySQL's storage backend interface puts a very low ceiling on
+the ways storage backends can make MySQL behave better.
+
+MySQL's data access paths through table engines are very simple: MySQL asks
+the engine to open a table, asks the engine to iterate through the table
+returning rows, filters the rows itself (outside of the storage engine), then
+asks the engine to close the table. Alternately, MySQL asks the engine to open
+a table, asks the engine to retrieve rows in range or for a single value over
+a specific index, filters the rows itself, and asks the engine to close the
+table.
+
+This simplistic interface frees table engines from having to worry about query
+optimization - in theory. Unfortunately, engine-specific features have a large
+impact on the performance of various query plans, but the channels back to the
+query planner provide very little granularity for estimating cost and prevent
+the planner from making good use of the engine in unusual cases. Conversely,
+the table engine system is totally isolated from the actual query, and can't
+make query-dependent performance choices “on its own.” There's no third path;
+the query planner itself is not pluggable.
+
+Similar consequences apply to type checking, support for new types, or even
+something as “obvious” as multiple automatic `TIMESTAMP` columns in the same
+table.
+
+Table manipulation -- creation, structural modification, and so on -- runs
+into similar problems. MySQL itself parses each `CREATE TABLE` statement, then
+hands off a parsed representation to the table engine so that it can manage
+storage. The parsed representation is lossy: there are plenty of forms MySQL's
+parser recognizes that aren't representable in a `TABLE` structure, preventing
+engines from implementing, say, column or tuple `CHECK` constraints without
+MySQL's help.
+
+The [sheer number of table
+engines](https://dev.mysql.com/doc/refman/5.5/en/storage-engines.html) makes
+that help very slow in coming. Any change to the table engine interface means
+perturbing the code to each engine, making progress on new MySQL-level
+features that interact with storage such as better query planning or new SQL
+constructs necessarily slow to implement and slow to test.
+
+### Held Back By History
+
+The original MySQL team focused on pure read performance and on “ease of use”
+(for new users with simple needs, as far as I can tell) over correctness and
+completeness, violating Knuth's laws of optimization. Many of these decisions
+locked MySQL into behaviours very early in its life that it still displays
+now. Features like implicit type conversions legitimately do help streamline
+development in very simple cases; experience with [other
+languages](http://me.veekun.com/blog/2012/04/09/php-a-fractal-of-bad-design/)
+unfortunately shows that the same behaviours sandbag development and help hide
+bugs in more sophisticated scenarios.
+
+MySQL has since changed hands, and the teams working on MySQL (and MariaDB,
+and Percona) are much more mature now than the team that made those early
+decisions. MySQL's massive and frequently non-savvy userbase makes it very hard
+to introduce breaking changes. At the same time, adding _optional_ breaking
+changes via server and client mode flags (such as `sql_mode`) increases the
+cognitive overhead of understanding MySQL's behaviours -- especially when that
+behaviour can vary from client to client, or when the server's configuration is
+out of the user's control (for example, on a shared host, or on EC2).
+
+A solution similar to Python's `from __future__ import` pragmas for making
+breaking changes opt-in some releases in advance of making them mandatory
+might help, but MySQL doesn't have the kind of highly-invested, highly-skilled
+user base that would make that effective -- and it still has all of the
+problems of modal behaviour.
+
+## Bad Arguments
+
+Inevitably, someone's going to come along and tell me how wrong I am and how
+MySQL is just fine as a database system. These people are everywhere, and they
+mean well too, and they are almost all wrong. There are two good reasons to
+use MySQL:
+
+1. **Some earlier group wrote for it, and we haven't finished porting our code
+ off of MySQL.**
+2. **We've considered all of these points, and many more, and decided that
+ `___feature_x___` that MySQL offers is worth the hassle.**
+
+Unfortunately, these aren't the reasons people do give, generally. The
+following are much more common:
+
+* **It's good enough.** No it ain't. There are plenty of other equally-capable
+ data storage systems that don't come with MySQL's huge raft of edge cases
+ and quirks.
+ * **We haven't run into these problems.** Actually, a lot of these
+ problems happen _silently_. Odds are, unless you write your queries and
+ schema statements with the manual open and refer back to it constantly,
+ or have been using MySQL since the 3.x era _daily_, at least some of
+ these issues have bitten you. The ones that prevent you from using your
+ database intelligently are very hard to notice in action.
+* **We already know how to use it.** MySQL development and administration
+ causes brain damage, folks, the same way PHP does. Where PHP teaches
+ programmers that “array” is the only structure you need, MySQL teaches
+ people that databases are awkward, slow, hard-to-tune monsters that require
+ constant attention. That doesn't have to be true; there are comfortable,
+ fast, and easily-tuned systems out there that don't require daily care and
+ feeding or the love of a specialist.
+* **It's the only thing our host supports.** [Get](https://linode.com/) [a](https://www.heroku.com/) [better](https://gandi.net/) [host](https://www.engineyard.com). It's
+ not like they're expensive or hard to find.
+ * **We used it because it was there.** Please hire some fucking software
+ developers and go back to writing elevator pitches and flirting with Y
+ Combinator.
+* **Everybody knows MySQL. It's easy to hire MySQL folks.** It's easy to hire
+ MCSEs, too, but you should be hiring for attitude and ability to learn, not
+ for specific skillsets, if you want to run a successful software project.
+ * **It's popular.** Sure, and nobody ever got fired for buying
+ IBM/Microsoft/Adobe. Popularity isn't any indication of quality, and if
+ we let popularity dictate what technology we use and improve we'll never
+ get anywhere. Marketing software to geeks is _easy_ - it's just that
+ lots of high-quality projects don't bother.
+* **It's lightweight.** So's [SQLite 3](https://www.sqlite.org) or
+ [H2](https://www.h2database.com/html/main.html). If you care about deployment
+ footprint more than any other factor, MySQL is actually pretty clunky (and
+ embedded MySQL has even bigger problems than freestanding MySQL).
+* **It's getting better, so we might as well stay on it.** [It's
+ true](https://dev.mysql.com/doc/refman/5.6/en/mysql-nutshell.html), if you go
+ by feature checklists and the manual, MySQL is improving “rapidly.” 5.6 is
+ due out soon and superficially looks to contain a number of good changes. I
+ have two problems with this line of reasoning:
+ 1. Why wait? Other databases are good _now_, not _eventually_.
+ 2. MySQL has a history of providing the bare minimum to satisfy a feature
+ checkbox without actually making the feature work well, work consistently,
+ or work in combination with other features.