Restore "mysql/choose-something-else" article.

Restore of the file dropped from 0d6f58c54a7af6c8b4e6cd98663eb36ec4e3accc to an assumed now-correct location to restore accessibility at the original URI. Additionally HTTPS'd many of the links, still requires verification pass for dead links.
author: Alice Bevan-McGregor <alice@gothcandy.com> 2020-04-14 18:06:43 -0400
committer: Alice Bevan-McGregor <alice@gothcandy.com> 2020-04-14 18:06:43 -0400
commit: 21f542a157971f90be1dcbadaad9eaa265ba0a62 (patch)
tree: b7a8da50e17bc780ed9242d89a11189c7f5e9452
parent: 34708dfa902afabf4833c25233132e56514915de (diff)
1 files changed, 736 insertions, 0 deletions
diff --git a/docs/mysql/choose-something-else.md b/docs/mysql/choose-something-else.md
new file mode 100644
index 0000000..738950f
--- /dev/null
+++ b/docs/mysql/choose-something-else.md
@@ -0,0 +1,736 @@
+# Do Not Pass This Way Again
+
+Considering MySQL? Use something else. Already on MySQL? Migrate. For every
+successful project built on MySQL, you could uncover a history of time wasted
+mitigating MySQL's inadequacies, masked by a hard-won, but meaningless, sense
+of accomplishment over the effort spent making MySQL behave.
+
+Thesis: databases fill roles ranging from pure storage to complex and
+interesting data processing; MySQL is differently bad at both tasks. Real apps
+all fall somewhere between these poles, and suffer variably from both sets of
+MySQL flaws.
+
+* MySQL is bad at [storage](#storage).
+* MySQL is bad at [data processing](#data-processing).
+* MySQL is bad [by design](#by-design).
+* [Bad arguments](#bad-arguments) for using MySQL.
+
+Much of this is inspired by the principles behind [PHP: A Fractal of Bad
+Design](http://me.veekun.com/blog/2012/04/09/php-a-fractal-of-bad-design/). I
+suggest reading that article too -- it's got a lot of good thought in it even
+if you already know to stay well away from PHP. (If that article offends you,
+well, this page probably will too.)
+
+## Storage
+
+Storage systems have four properties:
+
+1. Take and store data they receive from applications.
+2. Keep that data safe against loss or accidental change.
+3. Provide stored data to applications on demand.
+4. Give administrators effective management tools.
+
+In a truly “pure” storage application, data-comprehension features
+(constraints and relationships, nontrivial functions and aggregates) would go
+totally unused. There is a time and a place for this: the return of “NoSQL”
+storage systems attests to that.
+
+Pure storage systems tend to be closely coupled to their “main” application:
+consider most web/server app databases. “Secondary” clients tend to be
+read-only (reporting applications, monitoring) or to be utilities in service
+of the main application (migration tools, documentation tools). If you believe
+constraints, validity checks, and other comprehension features can be
+implemented in “the application,” you are probably thinking of databases close
+to this pole.
+
+### Storing Data
+
+MySQL has many edge cases which reduce the predictability of its behaviour
+when storing information. Most of these edge cases are documented, but violate
+the principle of least surprise (not to mention the expectations of users
+familiar with other SQL implementations).
+
+* Implicit conversions (particularly to and from string types) can modify
+  MySQL's behaviour.
+    * Many implicit conversions are also silent (no warning, no diagnostic),
+      by design, making it more likely developers are entirely unaware of
+      them until one does something surprising.
+* Conversions that violate basic constraints (range, length) of the output
+  type often coerce data rather than failing.
+    * Sometimes this raises a warning; does your app check for those?
+    * This behaviour is unlike many typed systems (but closely like PHP and
+      remotely like Perl).
+* Conversion behaviour depends on a per-connection configuration value
+  (`sql_mode`) that has [a large constellation of possible
+  states](https://dev.mysql.com/doc/refman/5.5/en/sql-mode.html), making
+  it harder to carry expectations from manual testing over to code or from
+  tool to tool.
+* MySQL recommends UTF-8 as a character-set, but still defaults to Latin-1.
+  The implimentation of `utf8` up until MySQL 5.5 was only the 3-byte
+  [BMP](https://en.wikipedia.org/wiki/Basic_Multilingual_Plane#Basic_Multilingual_Plane).
+  MySQL 5.5 and beyond supports a 4-byte `utf8`, but confusingly must be set
+  with the character-set `utf8mb4`. Implementation details of these encodings
+  within MySQL, such as the `utf8` 3-byte limit, tend to leak out into client
+  applications. Data that does not fit MySQL's understanding of the storage
+  encoding will be transformed until it does, by truncation or replacement, by
+  default.
+    * Collation support is per-encoding, with one of the stranger default
+      configurations: by default, the collation orders characters according to
+      Swedish alphabetization rules, case-insensitively.
+    * Since it's the default, lots of folks who don't know the manual
+      inside-out and backwards observe MySQL's case-insensitive collation
+      behaviour (`'a' = 'A'`) and conclude that “MySQL is case-insensitive,”
+      complicating any effort to use a case-sensitive locale.
+    * Both the encoding and the collation can vary, independently, by
+      _column_. Do you keep your schema definition open when you write
+      queries to watch out for this sort of shit?
+* The `TIMESTAMP` type tries to do something smart by storing values in a
+  canonical timezone (UTC), but it's done with so few affordances that it's
+  very hard to even _tell_ that MySQL's done a right thing with your data.
+    * And even after that, the result of `foo < '2012-04-01 09:00:00'` still
+      depends on what time of year it is when you evaluate the query, unless
+      you're very careful with your connection timezone.
+    * `TIMESTAMP` is also special-cased in MySQL's schema definition handling,
+      making it easy to accidentally create (or to accidentally fail to
+      create) an auto-updating field when you didn't (did) want one.
+    * `DATETIME` does not get the same timezone handling `TIMESTAMP` does.
+      What? And you can't provide your own without resorting to hacks like
+      extra columns.
+    * Oh, did you want to _use_ MySQL's timezone support? Too bad, none of
+      that data's loaded by default. You have to process the OS's `tzinfo`
+      files into SQL with a separate tool and import that. If you ever want to
+      update MySQL's timezone settings later, you need to take the server down
+      just to make sure the changes apply.
+
+### Preserving Data
+
+... against unexpected changes: like most disk-backed storage systems, MySQL
+is as reliable as the disks and filesystems its data lives on. MySQL provides
+no additional functionality in terms of mirroring or hardware failure tolerance
+(such as [Oracle ASM](https://en.wikipedia.org/wiki/Automatic_Storage_Management)).
+However this is a limitation shared with many, _many_ other systems.
+
+When using the InnoDB storage engine (default since MySQL 5.5), MySQL maintains page
+checksums in order to detect corruption caused by underlying storage.  However,
+many third-party software applications, as sell as users upgrading
+from earlier versions of MySQL may be using MyISAM, which will frequently corrupt
+data files on improper shutdown.
+
+The implicit conversion rules that bite when storing data also bite when
+asking MySQL to modify data - my favourite example being a fat-fingered
+`UPDATE` query where a mistyped `=` (as `-`, off by a single key) caused 90%
+of the rows in the table to be affected, instead of one row, because of
+implicit string-to-integer conversions.
+
+... against loss: hoo boy. MySQL, out of the box, gives you three approaches
+to [backups](https://dev.mysql.com/doc/refman/5.5/en/backup-methods.html):
+
+* Take “blind” filesystem backups with `tar` or `rsync`. Unless you
+  meticulously lock tables or make the database read-only for the duration,
+  this produces a backup that requires crash recovery before it will be
+  usable, and can produce an inconsistent database.
+    * This can bite quite hard if you use InnoDB, as InnoDB crash recovery
+      takes time proportional to both the number of InnoDB tables and the
+      total size of InnoDB tables, with a large constant.
+* Dump to SQL with `mysqldump`: slow, relatively large backups, and
+  non-incremental.
+* Archive binary logs: fragile, complex, over-configurable, and configured
+  badly by default. (Binary logging is also the basis of MySQL's replication
+  system.)
+
+If neither of these are sufficient, you're left with purchasing [a backup tool
+from
+Oracle](https://dev.mysql.com/doc/refman/5.5/en/glossary.html#glos_mysql_enterprise_backup)
+or from one of the third-party MySQL vendors.
+
+Like many of MySQL's features, the binary logging feature is
+[too](https://dev.mysql.com/doc/refman/5.5/en/binary-log.html)
+[configurable](https://dev.mysql.com/doc/refman/5.5/en/replication-options-binary-log.html),
+while still, somehow, defaulting to modes that are hazardous or surprising:
+the
+[default](https://dev.mysql.com/doc/refman/5.5/en/replication-options-binary-log.html#sysvar_binlog_format)
+[behaviour](https://dev.mysql.com/doc/refman/5.5/en/replication-formats.html)
+is to log SQL statements, rather than logging their side effects. This has
+lead to numerous bugs over the years; MySQL (now) makes an effort to make
+common “non-deterministic” cases such as `NOW()` and `RANDOM()` act
+deterministically but these have been addressed using ad-hoc solutions.
+Restoring binary-log-based backups can easily lead to data that differs from
+the original system, and by the time you've noticed the problem, it's too late
+to do anything about it.
+
+(Seriously. The binary log entries for each statement contain the “current”
+time on the master and the random seed at the start of the statement, just in
+case. If your non-deterministic query uses any other function, you're still
+[fucked by
+default](https://dev.mysql.com/doc/refman/5.5/en/replication-sbr-rbr.html#replication-sbr-rbr-sbr-disadvantages).)
+
+Additionally, a number of apparently-harmless features can lead to backups or
+replicas wandering out of sync with the original database, in the default
+configuration:
+
+* `AUTO_INCREMENT` and `UPDATE` statements.
+* `AUTO_INCREMENT` and `INSERT` statements (sometimes). SURPRISE.
+* Triggers.
+* User-defined (native) functions.
+* Stored (procedural SQL) functions.
+* `DELETE ... LIMIT` and `UPDATE ... LIMIT` statements, though if you use
+  these, you've misunderstood how SQL is supposed to work.
+* `INSERT ... ON DUPLICATE KEY UPDATE` statements.
+* Bulk-loading data with `LOAD DATA` statements.
+* [Operations on floating-point
+  values](https://dev.mysql.com/doc/refman/5.5/en/replication-features-floatvalues.html).
+
+### Retrieving Data
+
+This mostly works as expected. Most of the ways MySQL will screw you happen
+when you store data, not when you retrieve it. However, there are a few things
+that implicitly transform stored data before returning it:
+
+* MySQL's surreal type conversion system works the same way during `SELECT`
+  that it works during other operations, which can lead to queries matching
+  unexpected rows:
+
+        owen@scratch> CREATE TABLE account (
+            ->     accountid INTEGER
+            ->         AUTO_INCREMENT
+            ->         PRIMARY KEY,
+            ->     discountid INTEGER
+            -> );
+        Query OK, 0 rows affected (0.54 sec)
+
+        owen@scratch> INSERT INTO account
+            ->     (discountid)
+            -> VALUES
+            ->     (0),
+            ->     (1),
+            ->     (2);
+        Query OK, 3 rows affected (0.03 sec)
+        Records: 3  Duplicates: 0  Warnings: 0
+
+        owen@scratch> SELECT *
+            -> FROM account
+            -> WHERE discountid = 'banana';
+        +-----------+------------+
+        | accountid | discountid |
+        +-----------+------------+
+        |         1 |          0 |
+        +-----------+------------+
+        1 row in set, 1 warning (0.05 sec)
+
+    Ok, unexpected, but there's at least a warning (do your apps check for
+    those?) - let's see what it says:
+
+        owen@scratch> SHOW WARNINGS;
+        +---------+------+--------------------------------------------+
+        | Level   | Code | Message                                    |
+        +---------+------+--------------------------------------------+
+        | Warning | 1292 | Truncated incorrect DOUBLE value: 'banana' |
+        +---------+------+--------------------------------------------+
+        1 row in set (0.03 sec)
+
+    I can count on one hand the number of `DOUBLE` columns in this example and
+    still have five fingers left over.
+
+    You might think this is an unreasonable example: maybe you should always
+    make sure your argument types exactly match the field types, and the query
+    should use `57` instead of `'banana'`. (This does actually “fix” the
+    problem.) It's unrealistic to expect every single user to run `SHOW CREATE
+    TABLE` before every single query, or to memorize the types of every column
+    in your schema, though. This example derived from a technically-skilled
+    but MySQL-ignorant tester examining MySQL data to verify some behavioural
+    changes in an app.
+
+    * Actually, you don't even need a table for this: `SELECT 0 = 'banana'`
+      returns `1`. Did the [PHP](http://phpsadness.com/sad/52) folks design
+      MySQL's `=` operator?
+
+    * This isn't affected by `sql_mode`, even though so many other things are.
+
+* `TIMESTAMP` columns (and _only_ `TIMESTAMP` columns) can return
+  apparently-differing values for the same stored value depending on
+  per-connection configuration even during read-only operation. This is done
+  silently and the default behaviour can change as a side effect of non-MySQL
+  configuration changes in the underlying OS.
+* String-typed columns are transformed for encoding on output if the
+  connection is not using the same encoding as the underlying storage, using
+  the same rules as the transformation on input.
+* Values that stricter `sql_mode` settings would reject during storage can
+  still be returned during retrieval; it is impossible to predict in advance
+  whether such data exists, since clients are free to set `sql_mode` to any
+  value at any time.
+
+### Efficiency
+
+For purely store-and-retrieve applications, MySQL's query planner (which
+transforms the miniature program contained in each SQL statement into a tree
+of disk access and data manipulation steps) is sufficient, but only barely.
+Queries that retrieve data from one table, or from one table and a small
+number of one-to-maybe-one related tables, produce relatively efficient plans.
+
+MySQL, however, offers a number of tuning options that can have dramatic and
+counterintuitive effects, and the documentation provides very little advice
+for choosing settings. Tuning relies on the administrator's personal
+experience, blog articles of varying quality, and consultants.
+
+* The MySQL query cache defaults to a non-zero size in some commonly-installed
+  configurations. However, the larger the cache, the slower writes proceed:
+  invalidating cache entries that include the tables modified by a query means
+  considering every entry in the cache. This cache also uses MySQL's LRU
+  implementation, which has its own performance problems during eviction that
+  get worse with larger cache sizes.
+* Memory-management settings, including `key_buffer_size` and `innodb_buffer_pool_size`,
+  have non-linear relationships with performance. The [standard](http://www.mysqlperformanceblog.com/2006/09/29/what-to-tune-in-mysql-server-after-installation/)
+  [advice](http://www.mysqlperformanceblog.com/2007/11/01/innodb-performance-optimization-basics/) advises
+  making whichever value you care about more to a large value, but this can be
+  counterproductive if the related data is larger than the pool can hold:
+  MySQL is once again bad at discarding old buffer pages when the buffer is
+  exhausted, leading to dramatic slowdowns when query load reaches a certain
+  point.
+    * This also affects filesystem tuning settings such as `table_open_cache`.
+* InnoDB, out of the box, comes configured to use one large (and automatically
+  growing) tablespace file for all tables, complicating backups and storage
+  management. This is fine for trivial databases, but MySQL provides no tools
+  (aside from `DROP TABLE` and reloading the data from an SQL dump) for
+  transplanting a table to another tablespace, and provides no tools (aside
+  from a filesystem-level `rm`, and reloading _all_ InnoDB data from an SQL
+  dump) for reclaiming empty space in a tablespace file.
+* MySQL itself provides very few tools to manage storage; tasks like storing
+  large or infrequently-accessed tables and databases on dedicated filesystems
+  must be done on the filesystem, with MySQL shut down.
+
+## Data Processing
+
+Data processing encompasses tasks that require making decisions about data and
+tasks that derive new data from existing data. This is a huge range of topics:
+
+* Deciding (and enforcing) application-specific validity rules.
+* Summarizing and deriving data.
+* Providing and maintaining alternate representations and structures.
+* Hosting complex domain logic near the data it operates on.
+
+The further towards data processing tasks applications move, the more their
+SQL resembles tiny programs sent to the data. MySQL is totally unprepared for
+programs, and expects SQL to retrieve or modify simple rows.
+
+### Validity
+
+Good constraints are like `assert`s: in an ideal world, you can't tell if they
+work, because your code never violates them. Here in the real world,
+constraint violations happen for all sorts of reasons, ranging from buggy code
+to buggy human cognition. A good database gives you more places to describe
+your expectations and more tools for detecting and preventing surprises.
+MySQL, on the other hand, can't validate your data for you, beyond simple (and
+fixed) type constraints:
+
+* As with the data you store in it, MySQL feels free to change your table
+  definitions [implicitly and
+  silently](https://dev.mysql.com/doc/refman/5.5/en/silent-column-changes.html).
+  Many of these silent schema changes have important performance and
+  feature-availability implications.
+    * Foreign keys are ignored if you spell them certain, common, ways:
+
+            CREATE TABLE foo (
+                -- ...,
+                parent INTEGER
+                    NOT NULL
+                    REFERENCES foo_parent (id)
+                -- , ...
+            )
+
+        silently ignores the foreign key specification, while
+
+            CREATE TABLE foo (
+                -- ...,
+                parent INTEGER
+                    NOT NULL,
+                FOREIGN KEY (parent)
+                    REFERENCES foo_parent (id)
+                -- , ...
+            )
+
+        preserves it.
+
+* Foreign keys, one of the most widely-used database validity checks, are an
+  engine-specific feature, restricting their availability in combination with
+  other engine-specific features. (For example, a table cannot have both
+  foreign key constraints and full-text indexes, as of MySQL 5.5.)
+    * Configurations that violate assumptions about foreign keys, such as a
+      foreign key pointing into a MyISAM or NDB table, do not cause warnings
+      or any other diagnostics. The foreign key is simply discarded. SURPRISE.
+      (MySQL is riddled with these sorts of surprises, and apologists lean
+      very heavily on the “that's documented” excuse for its bad behaviour.)
+* The MySQL parser recognizes `CHECK` clauses, which allow schema developers
+  to make complex declarative assertions about tuples in the database, but
+  [discards them without
+  warning](https://dev.mysql.com/doc/refman/5.5/en/create-table.html). If you
+  want `CHECK`-like constraints, you must implement them as triggers - but see
+  below...
+* MySQL's comprehension of the `DEFAULT` clause is, uh, limited: only
+  constants are permitted, except for the [special
+  case](https://dev.mysql.com/doc/refman/5.5/en/timestamp-initialization.html)
+  of at most one `TIMESTAMP` column per table and at most one sequence-derived
+  column. Who designed this mess?
+    * Furthermore, there's no way to say “no default” and raise an error when
+      an INSERT forgets to provide a value. The default `DEFAULT` is either
+      `NULL` or a zero-like constant (`0`, `''`, and so on). Even for types
+      with no meaningful zero-like values (`DATETIME`).
+* MySQL has no mechanism for introducing new types, which might otherwise
+  provide a route to enforcing validity. Counting the number of special cases
+  in MySQL's [existing type
+  system](https://dev.mysql.com/doc/refman/5.5/en/data-types.html) illustrates
+  why that's probably unfixable.
+
+I hope every client with write access to your data is absolutely perfect,
+because MySQL _cannot help you_ if you make a mistake.
+
+### Summarizing and Deriving Data
+
+SQL databases generally provide features for doing “interesting” things with
+sets of tuples, and MySQL is no exception. However, MySQL's limitations mean
+that actually processing data in the database is fraught with wasted money,
+brains, and time:
+
+* Aggregate (`GROUP BY`) queries run up against limits in MySQL's query
+  planner: a query with both `WHERE` and `GROUP BY` clauses can only satisfy
+  one constraint or the other with indexes, unless there's an index that
+  covers all the relevant fields in both clauses, in the right order. (What
+  this order is depends on the complexity of the query and on the distribution
+  of the underlying data, but that's hardly MySQL-specific.)
+    * If you have all three of `WHERE`, `GROUP BY`, and `ORDER BY` in the same
+      query, you're more or less fucked. Good luck designing a single index
+      that satisfies all three.
+* Even though MySQL allows database administrators to [define normal functions
+  in a procedural SQL
+  dialect](https://dev.mysql.com/doc/refman/5.5/en/create-procedure.html),
+  [custom aggregate
+  functions](https://dev.mysql.com/doc/refman/5.5/en/create-function-udf.html)
+  can only be defined by native plugins. Good thing, too, because procedural
+  SQL in MySQL is its own kind of awful - more on that below.
+* Subqueries are often convenient and occasionally necessary for expressing
+  multi-step transformations on some underlying data. MySQL's query planner
+  has only one strategy for optimizing them: evaluate the innermost query as
+  written, into an in-memory table, then use a nested loop to satisfy joins or
+  `IN` clauses. For large subquery results or interestingly nested subqueries,
+  this is absurdly slow.
+    * MySQL's query planner can't fold constraints from outer queries into
+      subqueries.
+    * The generated in-memory table never has any indexes, ever, even when
+      appropriate indexes are “obvious” from the surrounding query; you cannot
+      even specify them.
+    * These limitations also affect views, which are evaluated as if they were
+      subqueries. In combination with the lack of constraint folding in the
+      planner, this makes filtering or aggregating over large views completely
+      impractical.
+    * MySQL lacks [common table
+      expressions](https://www.postgresql.org/docs/9.2/static/queries-with.html).
+      Even if subquery efficiency problems get fixed, the inability to give
+      meaningful names to subqueries makes them hard to read and comprehend.
+    * I hope you like `CREATE TEMPORARY TABLE AS SELECT`, because that's your
+      only real alternative.
+* [Window
+  functions](https://en.wikipedia.org/wiki/Select_(SQL)#Window_function) do not
+  exist at all in MySQL. This complicates many kinds of analysis, including
+  time series analyses and ranking analyses.
+    * Specific cases (for example, assigning rank numbers to rows) can be
+      implemented using [server-side variables and side effects during
+      `SELECT`](https://stackoverflow.com/questions/6473800/assigning-row-rank-numbers).
+      What? Good luck understanding that code in six months.
+* Even interesting joins run into trouble. MySQL's query planner has trouble
+  with a number of cases that can easily arise in well-normalized data:
+    * Joining and ordering by rows from multiple tables often forces MySQL to
+      dump the whole join to a temporary table, then sort it -- awful,
+      especially if you then use `LIMIT BY` to paginate the results.
+    * `JOIN` clauses with non-trivial conditions, such as joins by range or
+      joins by similarity, generally cause the planner to revert to table
+      scans even if the same condition would be indexable outside of a join.
+    * Joins with `WHERE` clauses that span both tables, where the rows
+      selected by the `WHERE` clause are outliers relative to the table
+      statistics, often cause MySQL to access tables in suboptimal order.
+* Ok, forget about interesting joins. Even interesting `WHERE` clauses can run
+  into trouble: MySQL can't index deterministic functions of a row, either.
+  While some deterministic functions can be eliminated from the `WHERE` clause
+  using simple algebra, many useful cases (whitespace-insensitive comparison,
+  hash-based comparisons, and so on) can't.
+    * You can fake these by storing the computed value in the row alongside
+      the “real” value. This leaves your schema with some ugly data repetition
+      and a chance for the two to fall out of sync, and clients must use the
+      “computed” column explicitly.
+    * Oh, and they must maintain the “computed” version explicitly.
+    * Or you can use triggers. Ha. See above.
+
+And now you know why MySQL advocates are such big fans of doing data
+_processing_ in “the client” or “the app.”
+
+### Alternate Representations and Derived Tables
+
+Many databases let schema designers and administrators abstract the underlying
+“physical” table structure from the presentation given to clients, or to some
+specific clients, for any of a number of reasons. MySQL tries to let you do
+this, too! And fumbles it quite badly.
+
+* As mentioned above, non-trivial views are basically useless. Queries like
+  `SELECT some columns FROM a_view WHERE id = 53` are evaluated in the
+  stupidest -- and slowest -- possible way. Good luck hiding unusual
+  partitioning arrangements or a permissions check in a view if you want any
+  kind of performance.
+* The poor interactions between triggers and binary logging's default
+  configuration make it impractical to use triggers to maintain “materialized”
+  views to avoid the problems with “real” views.
+    * It also effectively means triggers can't be used to emulate `CHECK`
+      constraints and other consistency features.
+    * Code to maintain materialized views is also finicky and hard to get
+      “right,” especially if the view includes aggregates or interesting joins
+      over its source data. I hope you enjoy debugging MySQL's procedural
+      SQL…
+* For the relatively common case of wanting to abstract partitioned storage
+  away for clients, MySQL actually has [a
+  tool](https://dev.mysql.com/doc/refman/5.5/en/partitioning.html) for it! But
+  it comes with [enough caveats to strangle a
+  horse](https://dev.mysql.com/doc/refman/5.5/en/partitioning-limitations.html):
+    * It's a separate table engine wrapping a “real” storage engine, which
+      means it has its own, separate support for engine-specific features:
+      transactions, foreign keys, and index types, `AUTO_INCREMENT`, and
+      others. The syntax for configuring partitions makes selecting the wrong
+      underlying engine entirely too easy, too.
+    * Partitioned tables may not be the referrent of foreign keys: you can't
+      have both enforced relationships and this kind of storage management.
+    * MySQL doesn't actually know how to store partitions on separate disks or
+      filesystems. You still need to reach underneath of MySQL do to actual
+      storage management.
+        * Partitioning an InnoDB table under the default InnoDB configuration
+          stores all of the partitions in the global tablespace file anyways.
+          Helpful! For per-table configurations, they still all end up
+          together in the same file. Partitioning InnoDB tables is a waste of
+          time for managing storage.
+    * TL,DR: MySQL's partition support is so finicky and limited that
+      MySQL-based apps tend to opt for multiple MySQL servers (“sharding”)
+      instead.
+
+### Hosting Logic In The Database
+
+Yeah, yeah, the usual reaction to stored procedures and in-DB code is “eww,
+yuck!” for some not-terrible reasons, but hear me out on two points:
+
+* Under the freestanding-database-server paradigm, there will usually be
+  network latency between database clients and the database itself. There are
+  two ways to minimize the impact of that: move the data to the code in bulk
+  to minimize round-trips, or move the code to the data.
+* Some database administration tasks are better implemented using in-database
+  code than as freestanding clients: complex data migrations that can't be
+  expressed as freestanding SQL queries, for example.
+
+MySQL, as of version
+[5.0](https://dev.mysql.com/doc/relnotes/mysql/5.0/en/news-5-0-0.html)
+(released in 2003 -- remember that date, I'll come back to it), has support
+for in-database code via a procedural SQL-like dialect, like many other SQL
+databases. This includes server-side procedures (blocks of stored code that
+are invoked outside of any other statements and return statement-like
+results), functions (blocks of stored code that compute a result, used in any
+expression context such as a `SELECT` list or `WHERE` clause), and triggers
+(blocks of stored code that run whenever a row is created, modified, or
+deleted).
+
+Given the examples of
+[other](https://www.postgresql.org/docs/7.3/static/plpgsql.html)
+[contemporaneous](https://msdn.microsoft.com/en-US/library/ms189826(v=sql.90).aspx)
+[procedural](https://docs.oracle.com/cd/B10501_01/appdev.920/a96624/toc.htm)
+[languages](https://www.firebirdsql.org/file/documentation/reference_manuals/reference_material/html/langrefupd15-psql.html),
+MySQL's procedural dialect -- an implementation of the
+[SQL/PSM](https://en.wikipedia.org/wiki/SQL/PSM) language -- is quite limited:
+
+* There is no language construct for looping over a query result. This seems
+  like a pretty fundamental feature for a database-hosted language, but no.
+* There is no language construct for looping while a condition holds. This
+  seems like a pretty fundamental feature for an imperative language designed
+  any time after about 1975, but no.
+* There is no language construct for looping over a range.
+* There is, in fact, one language construct for looping: the unconditional
+  loop. All other iteration control is done via conditional `LEAVE`
+  statements, as
+
+        BEGIN
+            DECLARE c CURSOR FOR
+                SELECT foo, bar, baz
+                FROM some_table
+                WHERE some_condition;
+            DECLARE done INT DEFAULT 0;
+            DECLARE CONTINUE HANDLER FOR NOT FOUND
+                SET done = 1;
+
+            DECLARE c_foo INTEGER;
+            DECLARE c_bar INTEGER;
+            DECLARE c_baz INTEGER;
+
+            OPEN c;
+            process_some_table: LOOP
+                FETCH c INTO c_foo, c_bar, c_baz;
+                IF done THEN
+                    LEAVE process_some_table;
+                END IF;
+
+                -- do something with c_foo, c_bar, c_baz
+            END LOOP;
+        END;
+
+    The original “structured programming” revolution in the 1960s seems to
+    have passed the MySQL team by.
+
+* Okay, I lied. There are two looping constructs: there's also the `REPEAT ...
+  UNTIL condition END REPEAT` construct, analogous to C's `do {} while
+  (!condition);` loop. But you still can't loop over query results, and you
+  can't run zero iterations of the loop's main body this way.
+* There is nothing resembling a modern exception system with automatic scoping
+  of handlers or declarative exception management. Error handling is entirely
+  via Visual Basic-style “on condition X, do Y” instructions, which remain in
+  effect for the rest of the program's execution.
+    * In the language shipped with MySQL 5.0, there wasn't a way to signal
+      errors, either: programmers had to resort to stunts like [intentionally
+      issuing failing
+      queries](https://stackoverflow.com/questions/465727/raise-error-within-mysql-function),
+      instead. Later versions of the language addressed this with the
+      [`SIGNAL`
+      statement](https://dev.mysql.com/doc/refman/5.5/en/signal.html): see,
+      they _can_ learn from better languages, eventually.
+* You can't escape to some other language, since MySQL doesn't have an
+  extension mechanism for server-side languages or a good way to call
+  out-of-process services during queries.
+
+The net result is that developing MySQL stored programs is unpleasant,
+uncomfortable, and far more error-prone than it could have been.
+
+## Why Is MySQL The Way It Is? { #by-design }
+
+MySQL's technology and history contain the seeds of all of these flaws.
+
+### Pluggable Storage Engines
+
+Very early in MySQL's life, the MySQL dev team realized that MyISAM was not
+the only way to store data, and opted to support other storage backends within
+MySQL. This is basically an alright idea; while I personally prefer storage
+systems that focus their effort on making one backend work very well,
+supporting multiple backends and letting third-party developers write their
+own is a pretty good approach too.
+
+Unfortunately, MySQL's storage backend interface puts a very low ceiling on
+the ways storage backends can make MySQL behave better.
+
+MySQL's data access paths through table engines are very simple: MySQL asks
+the engine to open a table, asks the engine to iterate through the table
+returning rows, filters the rows itself (outside of the storage engine), then
+asks the engine to close the table. Alternately, MySQL asks the engine to open
+a table, asks the engine to retrieve rows in range or for a single value over
+a specific index, filters the rows itself, and asks the engine to close the
+table.
+
+This simplistic interface frees table engines from having to worry about query
+optimization - in theory. Unfortunately, engine-specific features have a large
+impact on the performance of various query plans, but the channels back to the
+query planner provide very little granularity for estimating cost and prevent
+the planner from making good use of the engine in unusual cases. Conversely,
+the table engine system is totally isolated from the actual query, and can't
+make query-dependent performance choices “on its own.” There's no third path;
+the query planner itself is not pluggable.
+
+Similar consequences apply to type checking, support for new types, or even
+something as “obvious” as multiple automatic `TIMESTAMP` columns in the same
+table.
+
+Table manipulation -- creation, structural modification, and so on -- runs
+into similar problems. MySQL itself parses each `CREATE TABLE` statement, then
+hands off a parsed representation to the table engine so that it can manage
+storage. The parsed representation is lossy: there are plenty of forms MySQL's
+parser recognizes that aren't representable in a `TABLE` structure, preventing
+engines from implementing, say, column or tuple `CHECK` constraints without
+MySQL's help.
+
+The [sheer number of table
+engines](https://dev.mysql.com/doc/refman/5.5/en/storage-engines.html) makes
+that help very slow in coming. Any change to the table engine interface means
+perturbing the code to each engine, making progress on new MySQL-level
+features that interact with storage such as better query planning or new SQL
+constructs necessarily slow to implement and slow to test.
+
+### Held Back By History
+
+The original MySQL team focused on pure read performance and on “ease of use”
+(for new users with simple needs, as far as I can tell) over correctness and
+completeness, violating Knuth's laws of optimization. Many of these decisions
+locked MySQL into behaviours very early in its life that it still displays
+now. Features like implicit type conversions legitimately do help streamline
+development in very simple cases; experience with [other
+languages](http://me.veekun.com/blog/2012/04/09/php-a-fractal-of-bad-design/)
+unfortunately shows that the same behaviours sandbag development and help hide
+bugs in more sophisticated scenarios.
+
+MySQL has since changed hands, and the teams working on MySQL (and MariaDB,
+and Percona) are much more mature now than the team that made those early
+decisions. MySQL's massive and frequently non-savvy userbase makes it very hard
+to introduce breaking changes. At the same time, adding _optional_ breaking
+changes via server and client mode flags (such as `sql_mode`) increases the
+cognitive overhead of understanding MySQL's behaviours -- especially when that
+behaviour can vary from client to client, or when the server's configuration is
+out of the user's control (for example, on a shared host, or on EC2).
+
+A solution similar to Python's `from __future__ import` pragmas for making
+breaking changes opt-in some releases in advance of making them mandatory
+might help, but MySQL doesn't have the kind of highly-invested, highly-skilled
+user base that would make that effective -- and it still has all of the
+problems of modal behaviour.
+
+## Bad Arguments
+
+Inevitably, someone's going to come along and tell me how wrong I am and how
+MySQL is just fine as a database system. These people are everywhere, and they
+mean well too, and they are almost all wrong. There are two good reasons to
+use MySQL:
+
+1. **Some earlier group wrote for it, and we haven't finished porting our code
+    off of MySQL.**
+2. **We've considered all of these points, and many more, and decided that
+    `___feature_x___` that MySQL offers is worth the hassle.**
+
+Unfortunately, these aren't the reasons people do give, generally. The
+following are much more common:
+
+* **It's good enough.** No it ain't. There are plenty of other equally-capable
+  data storage systems that don't come with MySQL's huge raft of edge cases
+  and quirks.
+    * **We haven't run into these problems.** Actually, a lot of these
+      problems happen _silently_. Odds are, unless you write your queries and
+      schema statements with the manual open and refer back to it constantly,
+      or have been using MySQL since the 3.x era _daily_, at least some of
+      these issues have bitten you. The ones that prevent you from using your
+      database intelligently are very hard to notice in action.
+* **We already know how to use it.** MySQL development and administration
+  causes brain damage, folks, the same way PHP does. Where PHP teaches
+  programmers that “array” is the only structure you need, MySQL teaches
+  people that databases are awkward, slow, hard-to-tune monsters that require
+  constant attention. That doesn't have to be true; there are comfortable,
+  fast, and easily-tuned systems out there that don't require daily care and
+  feeding or the love of a specialist.
+* **It's the only thing our host supports.** [Get](https://linode.com/) [a](https://www.heroku.com/) [better](https://gandi.net/) [host](https://www.engineyard.com). It's
+  not like they're expensive or hard to find.
+    * **We used it because it was there.** Please hire some fucking software
+      developers and go back to writing elevator pitches and flirting with Y
+      Combinator.
+* **Everybody knows MySQL. It's easy to hire MySQL folks.** It's easy to hire
+  MCSEs, too, but you should be hiring for attitude and ability to learn, not
+  for specific skillsets, if you want to run a successful software project.
+    * **It's popular.** Sure, and nobody ever got fired for buying
+      IBM/Microsoft/Adobe. Popularity isn't any indication of quality, and if
+      we let popularity dictate what technology we use and improve we'll never
+      get anywhere. Marketing software to geeks is _easy_ - it's just that
+      lots of high-quality projects don't bother.
+* **It's lightweight.** So's [SQLite 3](https://www.sqlite.org) or
+  [H2](https://www.h2database.com/html/main.html). If you care about deployment
+  footprint more than any other factor, MySQL is actually pretty clunky (and
+  embedded MySQL has even bigger problems than freestanding MySQL).
+* **It's getting better, so we might as well stay on it.** [It's
+  true](https://dev.mysql.com/doc/refman/5.6/en/mysql-nutshell.html), if you go
+  by feature checklists and the manual, MySQL is improving “rapidly.” 5.6 is
+  due out soon and superficially looks to contain a number of good changes. I
+  have two problems with this line of reasoning:
+    1. Why wait? Other databases are good _now_, not _eventually_.
+    2. MySQL has a history of providing the bare minimum to satisfy a feature
+    checkbox without actually making the feature work well, work consistently,
+    or work in combination with other features.
author	Alice Bevan-McGregor <alice@gothcandy.com>	2020-04-14 18:06:43 -0400
committer	Alice Bevan-McGregor <alice@gothcandy.com>	2020-04-14 18:06:43 -0400
commit	21f542a157971f90be1dcbadaad9eaa265ba0a62 (patch)
tree	b7a8da50e17bc780ed9242d89a11189c7f5e9452
parent	34708dfa902afabf4833c25233132e56514915de (diff)