It's time.

author: Owen Jacobson <owen@grimoire.ca> 2023-10-05 01:35:02 -0400
committer: Owen Jacobson <owen@grimoire.ca> 2023-10-05 01:35:02 -0400
commit: 20928a1d9207a919b5c2ccab289213caff4002fd (patch)
tree: bb140b948e2ef9a702bd97f2997a8b5eee036a69
parent: 1b60b466e495902abdab9f13f223ae1639ea406d (diff)
1 files changed, 0 insertions, 745 deletions
diff --git a/docs/mysql/choose-something-else.md b/docs/mysql/choose-something-else.md
deleted file mode 100644
index 2631744..0000000
--- a/docs/mysql/choose-something-else.md
+++ /dev/null
@@ -1,745 +0,0 @@
-# Do Not Pass This Way Again
-
-!!! warning
-    I wrote this article in 2013, in what amounts to a fit of pique, and never
-    revisited it. Much of this information is outdated, and you rely on it at
-    your own risk. I restored it at the request of [a reader](https://github.com/grimoire-ca/bliki/pull/2).
-
-    The tone and structure of this article also reflects an angrier and much
-    less understanding person than the one I try to be today. Don't let my
-    anger be your cudgel.
-
-Considering MySQL? Use something else. Already on MySQL? Migrate. For every
-successful project built on MySQL, you could uncover a history of time wasted
-mitigating MySQL's inadequacies, masked by a hard-won, but meaningless, sense
-of accomplishment over the effort spent making MySQL behave.
-
-Thesis: databases fill roles ranging from pure storage to complex and
-interesting data processing; MySQL is differently bad at both tasks. Real apps
-all fall somewhere between these poles, and suffer variably from both sets of
-MySQL flaws.
-
-* MySQL is bad at [storage](#storage).
-* MySQL is bad at [data processing](#data-processing).
-* MySQL is bad [by design](#by-design).
-* [Bad arguments](#bad-arguments) for using MySQL.
-
-Much of this is inspired by the principles behind [PHP: A Fractal of Bad
-Design](http://me.veekun.com/blog/2012/04/09/php-a-fractal-of-bad-design/). I
-suggest reading that article too -- it's got a lot of good thought in it even
-if you already know to stay well away from PHP. (If that article offends you,
-well, this page probably will too.)
-
-## Storage
-
-Storage systems have four properties:
-
-1. Take and store data they receive from applications.
-2. Keep that data safe against loss or accidental change.
-3. Provide stored data to applications on demand.
-4. Give administrators effective management tools.
-
-In a truly “pure” storage application, data-comprehension features
-(constraints and relationships, nontrivial functions and aggregates) would go
-totally unused. There is a time and a place for this: the return of “NoSQL”
-storage systems attests to that.
-
-Pure storage systems tend to be closely coupled to their “main” application:
-consider most web/server app databases. “Secondary” clients tend to be
-read-only (reporting applications, monitoring) or to be utilities in service
-of the main application (migration tools, documentation tools). If you believe
-constraints, validity checks, and other comprehension features can be
-implemented in “the application,” you are probably thinking of databases close
-to this pole.
-
-### Storing Data
-
-MySQL has many edge cases which reduce the predictability of its behaviour
-when storing information. Most of these edge cases are documented, but violate
-the principle of least surprise (not to mention the expectations of users
-familiar with other SQL implementations).
-
-* Implicit conversions (particularly to and from string types) can modify
-  MySQL's behaviour.
-    * Many implicit conversions are also silent (no warning, no diagnostic),
-      by design, making it more likely developers are entirely unaware of
-      them until one does something surprising.
-* Conversions that violate basic constraints (range, length) of the output
-  type often coerce data rather than failing.
-    * Sometimes this raises a warning; does your app check for those?
-    * This behaviour is unlike many typed systems (but closely like PHP and
-      remotely like Perl).
-* Conversion behaviour depends on a per-connection configuration value
-  (`sql_mode`) that has [a large constellation of possible
-  states](https://dev.mysql.com/doc/refman/5.5/en/sql-mode.html), making
-  it harder to carry expectations from manual testing over to code or from
-  tool to tool.
-* MySQL recommends UTF-8 as a character-set, but still defaults to Latin-1.
-  The implimentation of `utf8` up until MySQL 5.5 was only the 3-byte
-  [BMP](https://en.wikipedia.org/wiki/Basic_Multilingual_Plane#Basic_Multilingual_Plane).
-  MySQL 5.5 and beyond supports a 4-byte `utf8`, but confusingly must be set
-  with the character-set `utf8mb4`. Implementation details of these encodings
-  within MySQL, such as the `utf8` 3-byte limit, tend to leak out into client
-  applications. Data that does not fit MySQL's understanding of the storage
-  encoding will be transformed until it does, by truncation or replacement, by
-  default.
-    * Collation support is per-encoding, with one of the stranger default
-      configurations: by default, the collation orders characters according to
-      Swedish alphabetization rules, case-insensitively.
-    * Since it's the default, lots of folks who don't know the manual
-      inside-out and backwards observe MySQL's case-insensitive collation
-      behaviour (`'a' = 'A'`) and conclude that “MySQL is case-insensitive,”
-      complicating any effort to use a case-sensitive locale.
-    * Both the encoding and the collation can vary, independently, by
-      _column_. Do you keep your schema definition open when you write
-      queries to watch out for this sort of shit?
-* The `TIMESTAMP` type tries to do something smart by storing values in a
-  canonical timezone (UTC), but it's done with so few affordances that it's
-  very hard to even _tell_ that MySQL's done a right thing with your data.
-    * And even after that, the result of `foo < '2012-04-01 09:00:00'` still
-      depends on what time of year it is when you evaluate the query, unless
-      you're very careful with your connection timezone.
-    * `TIMESTAMP` is also special-cased in MySQL's schema definition handling,
-      making it easy to accidentally create (or to accidentally fail to
-      create) an auto-updating field when you didn't (did) want one.
-    * `DATETIME` does not get the same timezone handling `TIMESTAMP` does.
-      What? And you can't provide your own without resorting to hacks like
-      extra columns.
-    * Oh, did you want to _use_ MySQL's timezone support? Too bad, none of
-      that data's loaded by default. You have to process the OS's `tzinfo`
-      files into SQL with a separate tool and import that. If you ever want to
-      update MySQL's timezone settings later, you need to take the server down
-      just to make sure the changes apply.
-
-### Preserving Data
-
-... against unexpected changes: like most disk-backed storage systems, MySQL
-is as reliable as the disks and filesystems its data lives on. MySQL provides
-no additional functionality in terms of mirroring or hardware failure tolerance
-(such as [Oracle ASM](https://en.wikipedia.org/wiki/Automatic_Storage_Management)).
-However this is a limitation shared with many, _many_ other systems.
-
-When using the InnoDB storage engine (default since MySQL 5.5), MySQL maintains page
-checksums in order to detect corruption caused by underlying storage.  However,
-many third-party software applications, as sell as users upgrading
-from earlier versions of MySQL may be using MyISAM, which will frequently corrupt
-data files on improper shutdown.
-
-The implicit conversion rules that bite when storing data also bite when
-asking MySQL to modify data - my favourite example being a fat-fingered
-`UPDATE` query where a mistyped `=` (as `-`, off by a single key) caused 90%
-of the rows in the table to be affected, instead of one row, because of
-implicit string-to-integer conversions.
-
-... against loss: hoo boy. MySQL, out of the box, gives you three approaches
-to [backups](https://dev.mysql.com/doc/refman/5.5/en/backup-methods.html):
-
-* Take “blind” filesystem backups with `tar` or `rsync`. Unless you
-  meticulously lock tables or make the database read-only for the duration,
-  this produces a backup that requires crash recovery before it will be
-  usable, and can produce an inconsistent database.
-    * This can bite quite hard if you use InnoDB, as InnoDB crash recovery
-      takes time proportional to both the number of InnoDB tables and the
-      total size of InnoDB tables, with a large constant.
-* Dump to SQL with `mysqldump`: slow, relatively large backups, and
-  non-incremental.
-* Archive binary logs: fragile, complex, over-configurable, and configured
-  badly by default. (Binary logging is also the basis of MySQL's replication
-  system.)
-
-If neither of these are sufficient, you're left with purchasing [a backup tool
-from
-Oracle](https://dev.mysql.com/doc/refman/5.5/en/glossary.html#glos_mysql_enterprise_backup)
-or from one of the third-party MySQL vendors.
-
-Like many of MySQL's features, the binary logging feature is
-[too](https://dev.mysql.com/doc/refman/5.5/en/binary-log.html)
-[configurable](https://dev.mysql.com/doc/refman/5.5/en/replication-options-binary-log.html),
-while still, somehow, defaulting to modes that are hazardous or surprising:
-the
-[default](https://dev.mysql.com/doc/refman/5.5/en/replication-options-binary-log.html#sysvar_binlog_format)
-[behaviour](https://dev.mysql.com/doc/refman/5.5/en/replication-formats.html)
-is to log SQL statements, rather than logging their side effects. This has
-lead to numerous bugs over the years; MySQL (now) makes an effort to make
-common “non-deterministic” cases such as `NOW()` and `RANDOM()` act
-deterministically but these have been addressed using ad-hoc solutions.
-Restoring binary-log-based backups can easily lead to data that differs from
-the original system, and by the time you've noticed the problem, it's too late
-to do anything about it.
-
-(Seriously. The binary log entries for each statement contain the “current”
-time on the master and the random seed at the start of the statement, just in
-case. If your non-deterministic query uses any other function, you're still
-[fucked by
-default](https://dev.mysql.com/doc/refman/5.5/en/replication-sbr-rbr.html#replication-sbr-rbr-sbr-disadvantages).)
-
-Additionally, a number of apparently-harmless features can lead to backups or
-replicas wandering out of sync with the original database, in the default
-configuration:
-
-* `AUTO_INCREMENT` and `UPDATE` statements.
-* `AUTO_INCREMENT` and `INSERT` statements (sometimes). SURPRISE.
-* Triggers.
-* User-defined (native) functions.
-* Stored (procedural SQL) functions.
-* `DELETE ... LIMIT` and `UPDATE ... LIMIT` statements, though if you use
-  these, you've misunderstood how SQL is supposed to work.
-* `INSERT ... ON DUPLICATE KEY UPDATE` statements.
-* Bulk-loading data with `LOAD DATA` statements.
-* [Operations on floating-point
-  values](https://dev.mysql.com/doc/refman/5.5/en/replication-features-floatvalues.html).
-
-### Retrieving Data
-
-This mostly works as expected. Most of the ways MySQL will screw you happen
-when you store data, not when you retrieve it. However, there are a few things
-that implicitly transform stored data before returning it:
-
-* MySQL's surreal type conversion system works the same way during `SELECT`
-  that it works during other operations, which can lead to queries matching
-  unexpected rows:
-
-        owen@scratch> CREATE TABLE account (
-            ->     accountid INTEGER
-            ->         AUTO_INCREMENT
-            ->         PRIMARY KEY,
-            ->     discountid INTEGER
-            -> );
-        Query OK, 0 rows affected (0.54 sec)
-
-        owen@scratch> INSERT INTO account
-            ->     (discountid)
-            -> VALUES
-            ->     (0),
-            ->     (1),
-            ->     (2);
-        Query OK, 3 rows affected (0.03 sec)
-        Records: 3  Duplicates: 0  Warnings: 0
-
-        owen@scratch> SELECT *
-            -> FROM account
-            -> WHERE discountid = 'banana';
-        +-----------+------------+
-        | accountid | discountid |
-        +-----------+------------+
-        |         1 |          0 |
-        +-----------+------------+
-        1 row in set, 1 warning (0.05 sec)
-
-    Ok, unexpected, but there's at least a warning (do your apps check for
-    those?) - let's see what it says:
-
-        owen@scratch> SHOW WARNINGS;
-        +---------+------+--------------------------------------------+
-        | Level   | Code | Message                                    |
-        +---------+------+--------------------------------------------+
-        | Warning | 1292 | Truncated incorrect DOUBLE value: 'banana' |
-        +---------+------+--------------------------------------------+
-        1 row in set (0.03 sec)
-
-    I can count on one hand the number of `DOUBLE` columns in this example and
-    still have five fingers left over.
-
-    You might think this is an unreasonable example: maybe you should always
-    make sure your argument types exactly match the field types, and the query
-    should use `57` instead of `'banana'`. (This does actually “fix” the
-    problem.) It's unrealistic to expect every single user to run `SHOW CREATE
-    TABLE` before every single query, or to memorize the types of every column
-    in your schema, though. This example derived from a technically-skilled
-    but MySQL-ignorant tester examining MySQL data to verify some behavioural
-    changes in an app.
-
-    * Actually, you don't even need a table for this: `SELECT 0 = 'banana'`
-      returns `1`. Did the [PHP](http://phpsadness.com/sad/52) folks design
-      MySQL's `=` operator?
-
-    * This isn't affected by `sql_mode`, even though so many other things are.
-
-* `TIMESTAMP` columns (and _only_ `TIMESTAMP` columns) can return
-  apparently-differing values for the same stored value depending on
-  per-connection configuration even during read-only operation. This is done
-  silently and the default behaviour can change as a side effect of non-MySQL
-  configuration changes in the underlying OS.
-* String-typed columns are transformed for encoding on output if the
-  connection is not using the same encoding as the underlying storage, using
-  the same rules as the transformation on input.
-* Values that stricter `sql_mode` settings would reject during storage can
-  still be returned during retrieval; it is impossible to predict in advance
-  whether such data exists, since clients are free to set `sql_mode` to any
-  value at any time.
-
-### Efficiency
-
-For purely store-and-retrieve applications, MySQL's query planner (which
-transforms the miniature program contained in each SQL statement into a tree
-of disk access and data manipulation steps) is sufficient, but only barely.
-Queries that retrieve data from one table, or from one table and a small
-number of one-to-maybe-one related tables, produce relatively efficient plans.
-
-MySQL, however, offers a number of tuning options that can have dramatic and
-counterintuitive effects, and the documentation provides very little advice
-for choosing settings. Tuning relies on the administrator's personal
-experience, blog articles of varying quality, and consultants.
-
-* The MySQL query cache defaults to a non-zero size in some commonly-installed
-  configurations. However, the larger the cache, the slower writes proceed:
-  invalidating cache entries that include the tables modified by a query means
-  considering every entry in the cache. This cache also uses MySQL's LRU
-  implementation, which has its own performance problems during eviction that
-  get worse with larger cache sizes.
-* Memory-management settings, including `key_buffer_size` and `innodb_buffer_pool_size`,
-  have non-linear relationships with performance. The [standard](http://www.mysqlperformanceblog.com/2006/09/29/what-to-tune-in-mysql-server-after-installation/)
-  [advice](http://www.mysqlperformanceblog.com/2007/11/01/innodb-performance-optimization-basics/) advises
-  making whichever value you care about more to a large value, but this can be
-  counterproductive if the related data is larger than the pool can hold:
-  MySQL is once again bad at discarding old buffer pages when the buffer is
-  exhausted, leading to dramatic slowdowns when query load reaches a certain
-  point.
-    * This also affects filesystem tuning settings such as `table_open_cache`.
-* InnoDB, out of the box, comes configured to use one large (and automatically
-  growing) tablespace file for all tables, complicating backups and storage
-  management. This is fine for trivial databases, but MySQL provides no tools
-  (aside from `DROP TABLE` and reloading the data from an SQL dump) for
-  transplanting a table to another tablespace, and provides no tools (aside
-  from a filesystem-level `rm`, and reloading _all_ InnoDB data from an SQL
-  dump) for reclaiming empty space in a tablespace file.
-* MySQL itself provides very few tools to manage storage; tasks like storing
-  large or infrequently-accessed tables and databases on dedicated filesystems
-  must be done on the filesystem, with MySQL shut down.
-
-## Data Processing
-
-Data processing encompasses tasks that require making decisions about data and
-tasks that derive new data from existing data. This is a huge range of topics:
-
-* Deciding (and enforcing) application-specific validity rules.
-* Summarizing and deriving data.
-* Providing and maintaining alternate representations and structures.
-* Hosting complex domain logic near the data it operates on.
-
-The further towards data processing tasks applications move, the more their
-SQL resembles tiny programs sent to the data. MySQL is totally unprepared for
-programs, and expects SQL to retrieve or modify simple rows.
-
-### Validity
-
-Good constraints are like `assert`s: in an ideal world, you can't tell if they
-work, because your code never violates them. Here in the real world,
-constraint violations happen for all sorts of reasons, ranging from buggy code
-to buggy human cognition. A good database gives you more places to describe
-your expectations and more tools for detecting and preventing surprises.
-MySQL, on the other hand, can't validate your data for you, beyond simple (and
-fixed) type constraints:
-
-* As with the data you store in it, MySQL feels free to change your table
-  definitions [implicitly and
-  silently](https://dev.mysql.com/doc/refman/5.5/en/silent-column-changes.html).
-  Many of these silent schema changes have important performance and
-  feature-availability implications.
-    * Foreign keys are ignored if you spell them certain, common, ways:
-
-            CREATE TABLE foo (
-                -- ...,
-                parent INTEGER
-                    NOT NULL
-                    REFERENCES foo_parent (id)
-                -- , ...
-            )
-
-        silently ignores the foreign key specification, while
-
-            CREATE TABLE foo (
-                -- ...,
-                parent INTEGER
-                    NOT NULL,
-                FOREIGN KEY (parent)
-                    REFERENCES foo_parent (id)
-                -- , ...
-            )
-
-        preserves it.
-
-* Foreign keys, one of the most widely-used database validity checks, are an
-  engine-specific feature, restricting their availability in combination with
-  other engine-specific features. (For example, a table cannot have both
-  foreign key constraints and full-text indexes, as of MySQL 5.5.)
-    * Configurations that violate assumptions about foreign keys, such as a
-      foreign key pointing into a MyISAM or NDB table, do not cause warnings
-      or any other diagnostics. The foreign key is simply discarded. SURPRISE.
-      (MySQL is riddled with these sorts of surprises, and apologists lean
-      very heavily on the “that's documented” excuse for its bad behaviour.)
-* The MySQL parser recognizes `CHECK` clauses, which allow schema developers
-  to make complex declarative assertions about tuples in the database, but
-  [discards them without
-  warning](https://dev.mysql.com/doc/refman/5.5/en/create-table.html). If you
-  want `CHECK`-like constraints, you must implement them as triggers - but see
-  below...
-* MySQL's comprehension of the `DEFAULT` clause is, uh, limited: only
-  constants are permitted, except for the [special
-  case](https://dev.mysql.com/doc/refman/5.5/en/timestamp-initialization.html)
-  of at most one `TIMESTAMP` column per table and at most one sequence-derived
-  column. Who designed this mess?
-    * Furthermore, there's no way to say “no default” and raise an error when
-      an INSERT forgets to provide a value. The default `DEFAULT` is either
-      `NULL` or a zero-like constant (`0`, `''`, and so on). Even for types
-      with no meaningful zero-like values (`DATETIME`).
-* MySQL has no mechanism for introducing new types, which might otherwise
-  provide a route to enforcing validity. Counting the number of special cases
-  in MySQL's [existing type
-  system](https://dev.mysql.com/doc/refman/5.5/en/data-types.html) illustrates
-  why that's probably unfixable.
-
-I hope every client with write access to your data is absolutely perfect,
-because MySQL _cannot help you_ if you make a mistake.
-
-### Summarizing and Deriving Data
-
-SQL databases generally provide features for doing “interesting” things with
-sets of tuples, and MySQL is no exception. However, MySQL's limitations mean
-that actually processing data in the database is fraught with wasted money,
-brains, and time:
-
-* Aggregate (`GROUP BY`) queries run up against limits in MySQL's query
-  planner: a query with both `WHERE` and `GROUP BY` clauses can only satisfy
-  one constraint or the other with indexes, unless there's an index that
-  covers all the relevant fields in both clauses, in the right order. (What
-  this order is depends on the complexity of the query and on the distribution
-  of the underlying data, but that's hardly MySQL-specific.)
-    * If you have all three of `WHERE`, `GROUP BY`, and `ORDER BY` in the same
-      query, you're more or less fucked. Good luck designing a single index
-      that satisfies all three.
-* Even though MySQL allows database administrators to [define normal functions
-  in a procedural SQL
-  dialect](https://dev.mysql.com/doc/refman/5.5/en/create-procedure.html),
-  [custom aggregate
-  functions](https://dev.mysql.com/doc/refman/5.5/en/create-function-udf.html)
-  can only be defined by native plugins. Good thing, too, because procedural
-  SQL in MySQL is its own kind of awful - more on that below.
-* Subqueries are often convenient and occasionally necessary for expressing
-  multi-step transformations on some underlying data. MySQL's query planner
-  has only one strategy for optimizing them: evaluate the innermost query as
-  written, into an in-memory table, then use a nested loop to satisfy joins or
-  `IN` clauses. For large subquery results or interestingly nested subqueries,
-  this is absurdly slow.
-    * MySQL's query planner can't fold constraints from outer queries into
-      subqueries.
-    * The generated in-memory table never has any indexes, ever, even when
-      appropriate indexes are “obvious” from the surrounding query; you cannot
-      even specify them.
-    * These limitations also affect views, which are evaluated as if they were
-      subqueries. In combination with the lack of constraint folding in the
-      planner, this makes filtering or aggregating over large views completely
-      impractical.
-    * MySQL lacks [common table
-      expressions](https://www.postgresql.org/docs/9.2/static/queries-with.html).
-      Even if subquery efficiency problems get fixed, the inability to give
-      meaningful names to subqueries makes them hard to read and comprehend.
-    * I hope you like `CREATE TEMPORARY TABLE AS SELECT`, because that's your
-      only real alternative.
-* [Window
-  functions](https://en.wikipedia.org/wiki/Select_(SQL)#Window_function) do not
-  exist at all in MySQL. This complicates many kinds of analysis, including
-  time series analyses and ranking analyses.
-    * Specific cases (for example, assigning rank numbers to rows) can be
-      implemented using [server-side variables and side effects during
-      `SELECT`](https://stackoverflow.com/questions/6473800/assigning-row-rank-numbers).
-      What? Good luck understanding that code in six months.
-* Even interesting joins run into trouble. MySQL's query planner has trouble
-  with a number of cases that can easily arise in well-normalized data:
-    * Joining and ordering by rows from multiple tables often forces MySQL to
-      dump the whole join to a temporary table, then sort it -- awful,
-      especially if you then use `LIMIT BY` to paginate the results.
-    * `JOIN` clauses with non-trivial conditions, such as joins by range or
-      joins by similarity, generally cause the planner to revert to table
-      scans even if the same condition would be indexable outside of a join.
-    * Joins with `WHERE` clauses that span both tables, where the rows
-      selected by the `WHERE` clause are outliers relative to the table
-      statistics, often cause MySQL to access tables in suboptimal order.
-* Ok, forget about interesting joins. Even interesting `WHERE` clauses can run
-  into trouble: MySQL can't index deterministic functions of a row, either.
-  While some deterministic functions can be eliminated from the `WHERE` clause
-  using simple algebra, many useful cases (whitespace-insensitive comparison,
-  hash-based comparisons, and so on) can't.
-    * You can fake these by storing the computed value in the row alongside
-      the “real” value. This leaves your schema with some ugly data repetition
-      and a chance for the two to fall out of sync, and clients must use the
-      “computed” column explicitly.
-    * Oh, and they must maintain the “computed” version explicitly.
-    * Or you can use triggers. Ha. See above.
-
-And now you know why MySQL advocates are such big fans of doing data
-_processing_ in “the client” or “the app.”
-
-### Alternate Representations and Derived Tables
-
-Many databases let schema designers and administrators abstract the underlying
-“physical” table structure from the presentation given to clients, or to some
-specific clients, for any of a number of reasons. MySQL tries to let you do
-this, too! And fumbles it quite badly.
-
-* As mentioned above, non-trivial views are basically useless. Queries like
-  `SELECT some columns FROM a_view WHERE id = 53` are evaluated in the
-  stupidest -- and slowest -- possible way. Good luck hiding unusual
-  partitioning arrangements or a permissions check in a view if you want any
-  kind of performance.
-* The poor interactions between triggers and binary logging's default
-  configuration make it impractical to use triggers to maintain “materialized”
-  views to avoid the problems with “real” views.
-    * It also effectively means triggers can't be used to emulate `CHECK`
-      constraints and other consistency features.
-    * Code to maintain materialized views is also finicky and hard to get
-      “right,” especially if the view includes aggregates or interesting joins
-      over its source data. I hope you enjoy debugging MySQL's procedural
-      SQL…
-* For the relatively common case of wanting to abstract partitioned storage
-  away for clients, MySQL actually has [a
-  tool](https://dev.mysql.com/doc/refman/5.5/en/partitioning.html) for it! But
-  it comes with [enough caveats to strangle a
-  horse](https://dev.mysql.com/doc/refman/5.5/en/partitioning-limitations.html):
-    * It's a separate table engine wrapping a “real” storage engine, which
-      means it has its own, separate support for engine-specific features:
-      transactions, foreign keys, and index types, `AUTO_INCREMENT`, and
-      others. The syntax for configuring partitions makes selecting the wrong
-      underlying engine entirely too easy, too.
-    * Partitioned tables may not be the referrent of foreign keys: you can't
-      have both enforced relationships and this kind of storage management.
-    * MySQL doesn't actually know how to store partitions on separate disks or
-      filesystems. You still need to reach underneath of MySQL do to actual
-      storage management.
-        * Partitioning an InnoDB table under the default InnoDB configuration
-          stores all of the partitions in the global tablespace file anyways.
-          Helpful! For per-table configurations, they still all end up
-          together in the same file. Partitioning InnoDB tables is a waste of
-          time for managing storage.
-    * TL,DR: MySQL's partition support is so finicky and limited that
-      MySQL-based apps tend to opt for multiple MySQL servers (“sharding”)
-      instead.
-
-### Hosting Logic In The Database
-
-Yeah, yeah, the usual reaction to stored procedures and in-DB code is “eww,
-yuck!” for some not-terrible reasons, but hear me out on two points:
-
-* Under the freestanding-database-server paradigm, there will usually be
-  network latency between database clients and the database itself. There are
-  two ways to minimize the impact of that: move the data to the code in bulk
-  to minimize round-trips, or move the code to the data.
-* Some database administration tasks are better implemented using in-database
-  code than as freestanding clients: complex data migrations that can't be
-  expressed as freestanding SQL queries, for example.
-
-MySQL, as of version
-[5.0](https://dev.mysql.com/doc/relnotes/mysql/5.0/en/news-5-0-0.html)
-(released in 2003 -- remember that date, I'll come back to it), has support
-for in-database code via a procedural SQL-like dialect, like many other SQL
-databases. This includes server-side procedures (blocks of stored code that
-are invoked outside of any other statements and return statement-like
-results), functions (blocks of stored code that compute a result, used in any
-expression context such as a `SELECT` list or `WHERE` clause), and triggers
-(blocks of stored code that run whenever a row is created, modified, or
-deleted).
-
-Given the examples of
-[other](https://www.postgresql.org/docs/7.3/static/plpgsql.html)
-[contemporaneous](https://msdn.microsoft.com/en-US/library/ms189826(v=sql.90).aspx)
-[procedural](https://docs.oracle.com/cd/B10501_01/appdev.920/a96624/toc.htm)
-[languages](https://www.firebirdsql.org/file/documentation/reference_manuals/reference_material/html/langrefupd15-psql.html),
-MySQL's procedural dialect -- an implementation of the
-[SQL/PSM](https://en.wikipedia.org/wiki/SQL/PSM) language -- is quite limited:
-
-* There is no language construct for looping over a query result. This seems
-  like a pretty fundamental feature for a database-hosted language, but no.
-* There is no language construct for looping while a condition holds. This
-  seems like a pretty fundamental feature for an imperative language designed
-  any time after about 1975, but no.
-* There is no language construct for looping over a range.
-* There is, in fact, one language construct for looping: the unconditional
-  loop. All other iteration control is done via conditional `LEAVE`
-  statements, as
-
-        BEGIN
-            DECLARE c CURSOR FOR
-                SELECT foo, bar, baz
-                FROM some_table
-                WHERE some_condition;
-            DECLARE done INT DEFAULT 0;
-            DECLARE CONTINUE HANDLER FOR NOT FOUND
-                SET done = 1;
-
-            DECLARE c_foo INTEGER;
-            DECLARE c_bar INTEGER;
-            DECLARE c_baz INTEGER;
-
-            OPEN c;
-            process_some_table: LOOP
-                FETCH c INTO c_foo, c_bar, c_baz;
-                IF done THEN
-                    LEAVE process_some_table;
-                END IF;
-
-                -- do something with c_foo, c_bar, c_baz
-            END LOOP;
-        END;
-
-    The original “structured programming” revolution in the 1960s seems to
-    have passed the MySQL team by.
-
-* Okay, I lied. There are two looping constructs: there's also the `REPEAT ...
-  UNTIL condition END REPEAT` construct, analogous to C's `do {} while
-  (!condition);` loop. But you still can't loop over query results, and you
-  can't run zero iterations of the loop's main body this way.
-* There is nothing resembling a modern exception system with automatic scoping
-  of handlers or declarative exception management. Error handling is entirely
-  via Visual Basic-style “on condition X, do Y” instructions, which remain in
-  effect for the rest of the program's execution.
-    * In the language shipped with MySQL 5.0, there wasn't a way to signal
-      errors, either: programmers had to resort to stunts like [intentionally
-      issuing failing
-      queries](https://stackoverflow.com/questions/465727/raise-error-within-mysql-function),
-      instead. Later versions of the language addressed this with the
-      [`SIGNAL`
-      statement](https://dev.mysql.com/doc/refman/5.5/en/signal.html): see,
-      they _can_ learn from better languages, eventually.
-* You can't escape to some other language, since MySQL doesn't have an
-  extension mechanism for server-side languages or a good way to call
-  out-of-process services during queries.
-
-The net result is that developing MySQL stored programs is unpleasant,
-uncomfortable, and far more error-prone than it could have been.
-
-## Why Is MySQL The Way It Is? { #by-design }
-
-MySQL's technology and history contain the seeds of all of these flaws.
-
-### Pluggable Storage Engines
-
-Very early in MySQL's life, the MySQL dev team realized that MyISAM was not
-the only way to store data, and opted to support other storage backends within
-MySQL. This is basically an alright idea; while I personally prefer storage
-systems that focus their effort on making one backend work very well,
-supporting multiple backends and letting third-party developers write their
-own is a pretty good approach too.
-
-Unfortunately, MySQL's storage backend interface puts a very low ceiling on
-the ways storage backends can make MySQL behave better.
-
-MySQL's data access paths through table engines are very simple: MySQL asks
-the engine to open a table, asks the engine to iterate through the table
-returning rows, filters the rows itself (outside of the storage engine), then
-asks the engine to close the table. Alternately, MySQL asks the engine to open
-a table, asks the engine to retrieve rows in range or for a single value over
-a specific index, filters the rows itself, and asks the engine to close the
-table.
-
-This simplistic interface frees table engines from having to worry about query
-optimization - in theory. Unfortunately, engine-specific features have a large
-impact on the performance of various query plans, but the channels back to the
-query planner provide very little granularity for estimating cost and prevent
-the planner from making good use of the engine in unusual cases. Conversely,
-the table engine system is totally isolated from the actual query, and can't
-make query-dependent performance choices “on its own.” There's no third path;
-the query planner itself is not pluggable.
-
-Similar consequences apply to type checking, support for new types, or even
-something as “obvious” as multiple automatic `TIMESTAMP` columns in the same
-table.
-
-Table manipulation -- creation, structural modification, and so on -- runs
-into similar problems. MySQL itself parses each `CREATE TABLE` statement, then
-hands off a parsed representation to the table engine so that it can manage
-storage. The parsed representation is lossy: there are plenty of forms MySQL's
-parser recognizes that aren't representable in a `TABLE` structure, preventing
-engines from implementing, say, column or tuple `CHECK` constraints without
-MySQL's help.
-
-The [sheer number of table
-engines](https://dev.mysql.com/doc/refman/5.5/en/storage-engines.html) makes
-that help very slow in coming. Any change to the table engine interface means
-perturbing the code to each engine, making progress on new MySQL-level
-features that interact with storage such as better query planning or new SQL
-constructs necessarily slow to implement and slow to test.
-
-### Held Back By History
-
-The original MySQL team focused on pure read performance and on “ease of use”
-(for new users with simple needs, as far as I can tell) over correctness and
-completeness, violating Knuth's laws of optimization. Many of these decisions
-locked MySQL into behaviours very early in its life that it still displays
-now. Features like implicit type conversions legitimately do help streamline
-development in very simple cases; experience with [other
-languages](http://me.veekun.com/blog/2012/04/09/php-a-fractal-of-bad-design/)
-unfortunately shows that the same behaviours sandbag development and help hide
-bugs in more sophisticated scenarios.
-
-MySQL has since changed hands, and the teams working on MySQL (and MariaDB,
-and Percona) are much more mature now than the team that made those early
-decisions. MySQL's massive and frequently non-savvy userbase makes it very hard
-to introduce breaking changes. At the same time, adding _optional_ breaking
-changes via server and client mode flags (such as `sql_mode`) increases the
-cognitive overhead of understanding MySQL's behaviours -- especially when that
-behaviour can vary from client to client, or when the server's configuration is
-out of the user's control (for example, on a shared host, or on EC2).
-
-A solution similar to Python's `from __future__ import` pragmas for making
-breaking changes opt-in some releases in advance of making them mandatory
-might help, but MySQL doesn't have the kind of highly-invested, highly-skilled
-user base that would make that effective -- and it still has all of the
-problems of modal behaviour.
-
-## Bad Arguments
-
-Inevitably, someone's going to come along and tell me how wrong I am and how
-MySQL is just fine as a database system. These people are everywhere, and they
-mean well too, and they are almost all wrong. There are two good reasons to
-use MySQL:
-
-1. **Some earlier group wrote for it, and we haven't finished porting our code
-    off of MySQL.**
-2. **We've considered all of these points, and many more, and decided that
-    `___feature_x___` that MySQL offers is worth the hassle.**
-
-Unfortunately, these aren't the reasons people do give, generally. The
-following are much more common:
-
-* **It's good enough.** No it ain't. There are plenty of other equally-capable
-  data storage systems that don't come with MySQL's huge raft of edge cases
-  and quirks.
-    * **We haven't run into these problems.** Actually, a lot of these
-      problems happen _silently_. Odds are, unless you write your queries and
-      schema statements with the manual open and refer back to it constantly,
-      or have been using MySQL since the 3.x era _daily_, at least some of
-      these issues have bitten you. The ones that prevent you from using your
-      database intelligently are very hard to notice in action.
-* **We already know how to use it.** MySQL development and administration
-  causes brain damage, folks, the same way PHP does. Where PHP teaches
-  programmers that “array” is the only structure you need, MySQL teaches
-  people that databases are awkward, slow, hard-to-tune monsters that require
-  constant attention. That doesn't have to be true; there are comfortable,
-  fast, and easily-tuned systems out there that don't require daily care and
-  feeding or the love of a specialist.
-* **It's the only thing our host supports.** [Get](https://linode.com/) [a](https://www.heroku.com/) [better](https://gandi.net/) [host](https://www.engineyard.com). It's
-  not like they're expensive or hard to find.
-    * **We used it because it was there.** Please hire some fucking software
-      developers and go back to writing elevator pitches and flirting with Y
-      Combinator.
-* **Everybody knows MySQL. It's easy to hire MySQL folks.** It's easy to hire
-  MCSEs, too, but you should be hiring for attitude and ability to learn, not
-  for specific skillsets, if you want to run a successful software project.
-    * **It's popular.** Sure, and nobody ever got fired for buying
-      IBM/Microsoft/Adobe. Popularity isn't any indication of quality, and if
-      we let popularity dictate what technology we use and improve we'll never
-      get anywhere. Marketing software to geeks is _easy_ - it's just that
-      lots of high-quality projects don't bother.
-* **It's lightweight.** So's [SQLite 3](https://www.sqlite.org) or
-  [H2](https://www.h2database.com/html/main.html). If you care about deployment
-  footprint more than any other factor, MySQL is actually pretty clunky (and
-  embedded MySQL has even bigger problems than freestanding MySQL).
-* **It's getting better, so we might as well stay on it.** [It's
-  true](https://dev.mysql.com/doc/refman/5.6/en/mysql-nutshell.html), if you go
-  by feature checklists and the manual, MySQL is improving “rapidly.” 5.6 is
-  due out soon and superficially looks to contain a number of good changes. I
-  have two problems with this line of reasoning:
-    1. Why wait? Other databases are good _now_, not _eventually_.
-    2. MySQL has a history of providing the bare minimum to satisfy a feature
-    checkbox without actually making the feature work well, work consistently,
-    or work in combination with other features.
author	Owen Jacobson <owen@grimoire.ca>	2023-10-05 01:35:02 -0400
committer	Owen Jacobson <owen@grimoire.ca>	2023-10-05 01:35:02 -0400
commit	20928a1d9207a919b5c2ccab289213caff4002fd (patch)
tree	bb140b948e2ef9a702bd97f2997a8b5eee036a69
parent	1b60b466e495902abdab9f13f223ae1639ea406d (diff)