summaryrefslogtreecommitdiff
diff options
context:
space:
mode:
authorOwen Jacobson <owen@grimoire.ca>2023-10-05 01:35:02 -0400
committerOwen Jacobson <owen@grimoire.ca>2023-10-05 01:35:02 -0400
commit20928a1d9207a919b5c2ccab289213caff4002fd (patch)
treebb140b948e2ef9a702bd97f2997a8b5eee036a69
parent1b60b466e495902abdab9f13f223ae1639ea406d (diff)
It's time.
-rw-r--r--docs/mysql/choose-something-else.md745
1 files changed, 0 insertions, 745 deletions
diff --git a/docs/mysql/choose-something-else.md b/docs/mysql/choose-something-else.md
deleted file mode 100644
index 2631744..0000000
--- a/docs/mysql/choose-something-else.md
+++ /dev/null
@@ -1,745 +0,0 @@
-# Do Not Pass This Way Again
-
-!!! warning
- I wrote this article in 2013, in what amounts to a fit of pique, and never
- revisited it. Much of this information is outdated, and you rely on it at
- your own risk. I restored it at the request of [a reader](https://github.com/grimoire-ca/bliki/pull/2).
-
- The tone and structure of this article also reflects an angrier and much
- less understanding person than the one I try to be today. Don't let my
- anger be your cudgel.
-
-Considering MySQL? Use something else. Already on MySQL? Migrate. For every
-successful project built on MySQL, you could uncover a history of time wasted
-mitigating MySQL's inadequacies, masked by a hard-won, but meaningless, sense
-of accomplishment over the effort spent making MySQL behave.
-
-Thesis: databases fill roles ranging from pure storage to complex and
-interesting data processing; MySQL is differently bad at both tasks. Real apps
-all fall somewhere between these poles, and suffer variably from both sets of
-MySQL flaws.
-
-* MySQL is bad at [storage](#storage).
-* MySQL is bad at [data processing](#data-processing).
-* MySQL is bad [by design](#by-design).
-* [Bad arguments](#bad-arguments) for using MySQL.
-
-Much of this is inspired by the principles behind [PHP: A Fractal of Bad
-Design](http://me.veekun.com/blog/2012/04/09/php-a-fractal-of-bad-design/). I
-suggest reading that article too -- it's got a lot of good thought in it even
-if you already know to stay well away from PHP. (If that article offends you,
-well, this page probably will too.)
-
-## Storage
-
-Storage systems have four properties:
-
-1. Take and store data they receive from applications.
-2. Keep that data safe against loss or accidental change.
-3. Provide stored data to applications on demand.
-4. Give administrators effective management tools.
-
-In a truly “pure” storage application, data-comprehension features
-(constraints and relationships, nontrivial functions and aggregates) would go
-totally unused. There is a time and a place for this: the return of “NoSQL”
-storage systems attests to that.
-
-Pure storage systems tend to be closely coupled to their “main” application:
-consider most web/server app databases. “Secondary” clients tend to be
-read-only (reporting applications, monitoring) or to be utilities in service
-of the main application (migration tools, documentation tools). If you believe
-constraints, validity checks, and other comprehension features can be
-implemented in “the application,” you are probably thinking of databases close
-to this pole.
-
-### Storing Data
-
-MySQL has many edge cases which reduce the predictability of its behaviour
-when storing information. Most of these edge cases are documented, but violate
-the principle of least surprise (not to mention the expectations of users
-familiar with other SQL implementations).
-
-* Implicit conversions (particularly to and from string types) can modify
- MySQL's behaviour.
- * Many implicit conversions are also silent (no warning, no diagnostic),
- by design, making it more likely developers are entirely unaware of
- them until one does something surprising.
-* Conversions that violate basic constraints (range, length) of the output
- type often coerce data rather than failing.
- * Sometimes this raises a warning; does your app check for those?
- * This behaviour is unlike many typed systems (but closely like PHP and
- remotely like Perl).
-* Conversion behaviour depends on a per-connection configuration value
- (`sql_mode`) that has [a large constellation of possible
- states](https://dev.mysql.com/doc/refman/5.5/en/sql-mode.html), making
- it harder to carry expectations from manual testing over to code or from
- tool to tool.
-* MySQL recommends UTF-8 as a character-set, but still defaults to Latin-1.
- The implimentation of `utf8` up until MySQL 5.5 was only the 3-byte
- [BMP](https://en.wikipedia.org/wiki/Basic_Multilingual_Plane#Basic_Multilingual_Plane).
- MySQL 5.5 and beyond supports a 4-byte `utf8`, but confusingly must be set
- with the character-set `utf8mb4`. Implementation details of these encodings
- within MySQL, such as the `utf8` 3-byte limit, tend to leak out into client
- applications. Data that does not fit MySQL's understanding of the storage
- encoding will be transformed until it does, by truncation or replacement, by
- default.
- * Collation support is per-encoding, with one of the stranger default
- configurations: by default, the collation orders characters according to
- Swedish alphabetization rules, case-insensitively.
- * Since it's the default, lots of folks who don't know the manual
- inside-out and backwards observe MySQL's case-insensitive collation
- behaviour (`'a' = 'A'`) and conclude that “MySQL is case-insensitive,”
- complicating any effort to use a case-sensitive locale.
- * Both the encoding and the collation can vary, independently, by
- _column_. Do you keep your schema definition open when you write
- queries to watch out for this sort of shit?
-* The `TIMESTAMP` type tries to do something smart by storing values in a
- canonical timezone (UTC), but it's done with so few affordances that it's
- very hard to even _tell_ that MySQL's done a right thing with your data.
- * And even after that, the result of `foo < '2012-04-01 09:00:00'` still
- depends on what time of year it is when you evaluate the query, unless
- you're very careful with your connection timezone.
- * `TIMESTAMP` is also special-cased in MySQL's schema definition handling,
- making it easy to accidentally create (or to accidentally fail to
- create) an auto-updating field when you didn't (did) want one.
- * `DATETIME` does not get the same timezone handling `TIMESTAMP` does.
- What? And you can't provide your own without resorting to hacks like
- extra columns.
- * Oh, did you want to _use_ MySQL's timezone support? Too bad, none of
- that data's loaded by default. You have to process the OS's `tzinfo`
- files into SQL with a separate tool and import that. If you ever want to
- update MySQL's timezone settings later, you need to take the server down
- just to make sure the changes apply.
-
-### Preserving Data
-
-... against unexpected changes: like most disk-backed storage systems, MySQL
-is as reliable as the disks and filesystems its data lives on. MySQL provides
-no additional functionality in terms of mirroring or hardware failure tolerance
-(such as [Oracle ASM](https://en.wikipedia.org/wiki/Automatic_Storage_Management)).
-However this is a limitation shared with many, _many_ other systems.
-
-When using the InnoDB storage engine (default since MySQL 5.5), MySQL maintains page
-checksums in order to detect corruption caused by underlying storage. However,
-many third-party software applications, as sell as users upgrading
-from earlier versions of MySQL may be using MyISAM, which will frequently corrupt
-data files on improper shutdown.
-
-The implicit conversion rules that bite when storing data also bite when
-asking MySQL to modify data - my favourite example being a fat-fingered
-`UPDATE` query where a mistyped `=` (as `-`, off by a single key) caused 90%
-of the rows in the table to be affected, instead of one row, because of
-implicit string-to-integer conversions.
-
-... against loss: hoo boy. MySQL, out of the box, gives you three approaches
-to [backups](https://dev.mysql.com/doc/refman/5.5/en/backup-methods.html):
-
-* Take “blind” filesystem backups with `tar` or `rsync`. Unless you
- meticulously lock tables or make the database read-only for the duration,
- this produces a backup that requires crash recovery before it will be
- usable, and can produce an inconsistent database.
- * This can bite quite hard if you use InnoDB, as InnoDB crash recovery
- takes time proportional to both the number of InnoDB tables and the
- total size of InnoDB tables, with a large constant.
-* Dump to SQL with `mysqldump`: slow, relatively large backups, and
- non-incremental.
-* Archive binary logs: fragile, complex, over-configurable, and configured
- badly by default. (Binary logging is also the basis of MySQL's replication
- system.)
-
-If neither of these are sufficient, you're left with purchasing [a backup tool
-from
-Oracle](https://dev.mysql.com/doc/refman/5.5/en/glossary.html#glos_mysql_enterprise_backup)
-or from one of the third-party MySQL vendors.
-
-Like many of MySQL's features, the binary logging feature is
-[too](https://dev.mysql.com/doc/refman/5.5/en/binary-log.html)
-[configurable](https://dev.mysql.com/doc/refman/5.5/en/replication-options-binary-log.html),
-while still, somehow, defaulting to modes that are hazardous or surprising:
-the
-[default](https://dev.mysql.com/doc/refman/5.5/en/replication-options-binary-log.html#sysvar_binlog_format)
-[behaviour](https://dev.mysql.com/doc/refman/5.5/en/replication-formats.html)
-is to log SQL statements, rather than logging their side effects. This has
-lead to numerous bugs over the years; MySQL (now) makes an effort to make
-common “non-deterministic” cases such as `NOW()` and `RANDOM()` act
-deterministically but these have been addressed using ad-hoc solutions.
-Restoring binary-log-based backups can easily lead to data that differs from
-the original system, and by the time you've noticed the problem, it's too late
-to do anything about it.
-
-(Seriously. The binary log entries for each statement contain the “current”
-time on the master and the random seed at the start of the statement, just in
-case. If your non-deterministic query uses any other function, you're still
-[fucked by
-default](https://dev.mysql.com/doc/refman/5.5/en/replication-sbr-rbr.html#replication-sbr-rbr-sbr-disadvantages).)
-
-Additionally, a number of apparently-harmless features can lead to backups or
-replicas wandering out of sync with the original database, in the default
-configuration:
-
-* `AUTO_INCREMENT` and `UPDATE` statements.
-* `AUTO_INCREMENT` and `INSERT` statements (sometimes). SURPRISE.
-* Triggers.
-* User-defined (native) functions.
-* Stored (procedural SQL) functions.
-* `DELETE ... LIMIT` and `UPDATE ... LIMIT` statements, though if you use
- these, you've misunderstood how SQL is supposed to work.
-* `INSERT ... ON DUPLICATE KEY UPDATE` statements.
-* Bulk-loading data with `LOAD DATA` statements.
-* [Operations on floating-point
- values](https://dev.mysql.com/doc/refman/5.5/en/replication-features-floatvalues.html).
-
-### Retrieving Data
-
-This mostly works as expected. Most of the ways MySQL will screw you happen
-when you store data, not when you retrieve it. However, there are a few things
-that implicitly transform stored data before returning it:
-
-* MySQL's surreal type conversion system works the same way during `SELECT`
- that it works during other operations, which can lead to queries matching
- unexpected rows:
-
- owen@scratch> CREATE TABLE account (
- -> accountid INTEGER
- -> AUTO_INCREMENT
- -> PRIMARY KEY,
- -> discountid INTEGER
- -> );
- Query OK, 0 rows affected (0.54 sec)
-
- owen@scratch> INSERT INTO account
- -> (discountid)
- -> VALUES
- -> (0),
- -> (1),
- -> (2);
- Query OK, 3 rows affected (0.03 sec)
- Records: 3 Duplicates: 0 Warnings: 0
-
- owen@scratch> SELECT *
- -> FROM account
- -> WHERE discountid = 'banana';
- +-----------+------------+
- | accountid | discountid |
- +-----------+------------+
- | 1 | 0 |
- +-----------+------------+
- 1 row in set, 1 warning (0.05 sec)
-
- Ok, unexpected, but there's at least a warning (do your apps check for
- those?) - let's see what it says:
-
- owen@scratch> SHOW WARNINGS;
- +---------+------+--------------------------------------------+
- | Level | Code | Message |
- +---------+------+--------------------------------------------+
- | Warning | 1292 | Truncated incorrect DOUBLE value: 'banana' |
- +---------+------+--------------------------------------------+
- 1 row in set (0.03 sec)
-
- I can count on one hand the number of `DOUBLE` columns in this example and
- still have five fingers left over.
-
- You might think this is an unreasonable example: maybe you should always
- make sure your argument types exactly match the field types, and the query
- should use `57` instead of `'banana'`. (This does actually “fix” the
- problem.) It's unrealistic to expect every single user to run `SHOW CREATE
- TABLE` before every single query, or to memorize the types of every column
- in your schema, though. This example derived from a technically-skilled
- but MySQL-ignorant tester examining MySQL data to verify some behavioural
- changes in an app.
-
- * Actually, you don't even need a table for this: `SELECT 0 = 'banana'`
- returns `1`. Did the [PHP](http://phpsadness.com/sad/52) folks design
- MySQL's `=` operator?
-
- * This isn't affected by `sql_mode`, even though so many other things are.
-
-* `TIMESTAMP` columns (and _only_ `TIMESTAMP` columns) can return
- apparently-differing values for the same stored value depending on
- per-connection configuration even during read-only operation. This is done
- silently and the default behaviour can change as a side effect of non-MySQL
- configuration changes in the underlying OS.
-* String-typed columns are transformed for encoding on output if the
- connection is not using the same encoding as the underlying storage, using
- the same rules as the transformation on input.
-* Values that stricter `sql_mode` settings would reject during storage can
- still be returned during retrieval; it is impossible to predict in advance
- whether such data exists, since clients are free to set `sql_mode` to any
- value at any time.
-
-### Efficiency
-
-For purely store-and-retrieve applications, MySQL's query planner (which
-transforms the miniature program contained in each SQL statement into a tree
-of disk access and data manipulation steps) is sufficient, but only barely.
-Queries that retrieve data from one table, or from one table and a small
-number of one-to-maybe-one related tables, produce relatively efficient plans.
-
-MySQL, however, offers a number of tuning options that can have dramatic and
-counterintuitive effects, and the documentation provides very little advice
-for choosing settings. Tuning relies on the administrator's personal
-experience, blog articles of varying quality, and consultants.
-
-* The MySQL query cache defaults to a non-zero size in some commonly-installed
- configurations. However, the larger the cache, the slower writes proceed:
- invalidating cache entries that include the tables modified by a query means
- considering every entry in the cache. This cache also uses MySQL's LRU
- implementation, which has its own performance problems during eviction that
- get worse with larger cache sizes.
-* Memory-management settings, including `key_buffer_size` and `innodb_buffer_pool_size`,
- have non-linear relationships with performance. The [standard](http://www.mysqlperformanceblog.com/2006/09/29/what-to-tune-in-mysql-server-after-installation/)
- [advice](http://www.mysqlperformanceblog.com/2007/11/01/innodb-performance-optimization-basics/) advises
- making whichever value you care about more to a large value, but this can be
- counterproductive if the related data is larger than the pool can hold:
- MySQL is once again bad at discarding old buffer pages when the buffer is
- exhausted, leading to dramatic slowdowns when query load reaches a certain
- point.
- * This also affects filesystem tuning settings such as `table_open_cache`.
-* InnoDB, out of the box, comes configured to use one large (and automatically
- growing) tablespace file for all tables, complicating backups and storage
- management. This is fine for trivial databases, but MySQL provides no tools
- (aside from `DROP TABLE` and reloading the data from an SQL dump) for
- transplanting a table to another tablespace, and provides no tools (aside
- from a filesystem-level `rm`, and reloading _all_ InnoDB data from an SQL
- dump) for reclaiming empty space in a tablespace file.
-* MySQL itself provides very few tools to manage storage; tasks like storing
- large or infrequently-accessed tables and databases on dedicated filesystems
- must be done on the filesystem, with MySQL shut down.
-
-## Data Processing
-
-Data processing encompasses tasks that require making decisions about data and
-tasks that derive new data from existing data. This is a huge range of topics:
-
-* Deciding (and enforcing) application-specific validity rules.
-* Summarizing and deriving data.
-* Providing and maintaining alternate representations and structures.
-* Hosting complex domain logic near the data it operates on.
-
-The further towards data processing tasks applications move, the more their
-SQL resembles tiny programs sent to the data. MySQL is totally unprepared for
-programs, and expects SQL to retrieve or modify simple rows.
-
-### Validity
-
-Good constraints are like `assert`s: in an ideal world, you can't tell if they
-work, because your code never violates them. Here in the real world,
-constraint violations happen for all sorts of reasons, ranging from buggy code
-to buggy human cognition. A good database gives you more places to describe
-your expectations and more tools for detecting and preventing surprises.
-MySQL, on the other hand, can't validate your data for you, beyond simple (and
-fixed) type constraints:
-
-* As with the data you store in it, MySQL feels free to change your table
- definitions [implicitly and
- silently](https://dev.mysql.com/doc/refman/5.5/en/silent-column-changes.html).
- Many of these silent schema changes have important performance and
- feature-availability implications.
- * Foreign keys are ignored if you spell them certain, common, ways:
-
- CREATE TABLE foo (
- -- ...,
- parent INTEGER
- NOT NULL
- REFERENCES foo_parent (id)
- -- , ...
- )
-
- silently ignores the foreign key specification, while
-
- CREATE TABLE foo (
- -- ...,
- parent INTEGER
- NOT NULL,
- FOREIGN KEY (parent)
- REFERENCES foo_parent (id)
- -- , ...
- )
-
- preserves it.
-
-* Foreign keys, one of the most widely-used database validity checks, are an
- engine-specific feature, restricting their availability in combination with
- other engine-specific features. (For example, a table cannot have both
- foreign key constraints and full-text indexes, as of MySQL 5.5.)
- * Configurations that violate assumptions about foreign keys, such as a
- foreign key pointing into a MyISAM or NDB table, do not cause warnings
- or any other diagnostics. The foreign key is simply discarded. SURPRISE.
- (MySQL is riddled with these sorts of surprises, and apologists lean
- very heavily on the “that's documented” excuse for its bad behaviour.)
-* The MySQL parser recognizes `CHECK` clauses, which allow schema developers
- to make complex declarative assertions about tuples in the database, but
- [discards them without
- warning](https://dev.mysql.com/doc/refman/5.5/en/create-table.html). If you
- want `CHECK`-like constraints, you must implement them as triggers - but see
- below...
-* MySQL's comprehension of the `DEFAULT` clause is, uh, limited: only
- constants are permitted, except for the [special
- case](https://dev.mysql.com/doc/refman/5.5/en/timestamp-initialization.html)
- of at most one `TIMESTAMP` column per table and at most one sequence-derived
- column. Who designed this mess?
- * Furthermore, there's no way to say “no default” and raise an error when
- an INSERT forgets to provide a value. The default `DEFAULT` is either
- `NULL` or a zero-like constant (`0`, `''`, and so on). Even for types
- with no meaningful zero-like values (`DATETIME`).
-* MySQL has no mechanism for introducing new types, which might otherwise
- provide a route to enforcing validity. Counting the number of special cases
- in MySQL's [existing type
- system](https://dev.mysql.com/doc/refman/5.5/en/data-types.html) illustrates
- why that's probably unfixable.
-
-I hope every client with write access to your data is absolutely perfect,
-because MySQL _cannot help you_ if you make a mistake.
-
-### Summarizing and Deriving Data
-
-SQL databases generally provide features for doing “interesting” things with
-sets of tuples, and MySQL is no exception. However, MySQL's limitations mean
-that actually processing data in the database is fraught with wasted money,
-brains, and time:
-
-* Aggregate (`GROUP BY`) queries run up against limits in MySQL's query
- planner: a query with both `WHERE` and `GROUP BY` clauses can only satisfy
- one constraint or the other with indexes, unless there's an index that
- covers all the relevant fields in both clauses, in the right order. (What
- this order is depends on the complexity of the query and on the distribution
- of the underlying data, but that's hardly MySQL-specific.)
- * If you have all three of `WHERE`, `GROUP BY`, and `ORDER BY` in the same
- query, you're more or less fucked. Good luck designing a single index
- that satisfies all three.
-* Even though MySQL allows database administrators to [define normal functions
- in a procedural SQL
- dialect](https://dev.mysql.com/doc/refman/5.5/en/create-procedure.html),
- [custom aggregate
- functions](https://dev.mysql.com/doc/refman/5.5/en/create-function-udf.html)
- can only be defined by native plugins. Good thing, too, because procedural
- SQL in MySQL is its own kind of awful - more on that below.
-* Subqueries are often convenient and occasionally necessary for expressing
- multi-step transformations on some underlying data. MySQL's query planner
- has only one strategy for optimizing them: evaluate the innermost query as
- written, into an in-memory table, then use a nested loop to satisfy joins or
- `IN` clauses. For large subquery results or interestingly nested subqueries,
- this is absurdly slow.
- * MySQL's query planner can't fold constraints from outer queries into
- subqueries.
- * The generated in-memory table never has any indexes, ever, even when
- appropriate indexes are “obvious” from the surrounding query; you cannot
- even specify them.
- * These limitations also affect views, which are evaluated as if they were
- subqueries. In combination with the lack of constraint folding in the
- planner, this makes filtering or aggregating over large views completely
- impractical.
- * MySQL lacks [common table
- expressions](https://www.postgresql.org/docs/9.2/static/queries-with.html).
- Even if subquery efficiency problems get fixed, the inability to give
- meaningful names to subqueries makes them hard to read and comprehend.
- * I hope you like `CREATE TEMPORARY TABLE AS SELECT`, because that's your
- only real alternative.
-* [Window
- functions](https://en.wikipedia.org/wiki/Select_(SQL)#Window_function) do not
- exist at all in MySQL. This complicates many kinds of analysis, including
- time series analyses and ranking analyses.
- * Specific cases (for example, assigning rank numbers to rows) can be
- implemented using [server-side variables and side effects during
- `SELECT`](https://stackoverflow.com/questions/6473800/assigning-row-rank-numbers).
- What? Good luck understanding that code in six months.
-* Even interesting joins run into trouble. MySQL's query planner has trouble
- with a number of cases that can easily arise in well-normalized data:
- * Joining and ordering by rows from multiple tables often forces MySQL to
- dump the whole join to a temporary table, then sort it -- awful,
- especially if you then use `LIMIT BY` to paginate the results.
- * `JOIN` clauses with non-trivial conditions, such as joins by range or
- joins by similarity, generally cause the planner to revert to table
- scans even if the same condition would be indexable outside of a join.
- * Joins with `WHERE` clauses that span both tables, where the rows
- selected by the `WHERE` clause are outliers relative to the table
- statistics, often cause MySQL to access tables in suboptimal order.
-* Ok, forget about interesting joins. Even interesting `WHERE` clauses can run
- into trouble: MySQL can't index deterministic functions of a row, either.
- While some deterministic functions can be eliminated from the `WHERE` clause
- using simple algebra, many useful cases (whitespace-insensitive comparison,
- hash-based comparisons, and so on) can't.
- * You can fake these by storing the computed value in the row alongside
- the “real” value. This leaves your schema with some ugly data repetition
- and a chance for the two to fall out of sync, and clients must use the
- “computed” column explicitly.
- * Oh, and they must maintain the “computed” version explicitly.
- * Or you can use triggers. Ha. See above.
-
-And now you know why MySQL advocates are such big fans of doing data
-_processing_ in “the client” or “the app.”
-
-### Alternate Representations and Derived Tables
-
-Many databases let schema designers and administrators abstract the underlying
-“physical” table structure from the presentation given to clients, or to some
-specific clients, for any of a number of reasons. MySQL tries to let you do
-this, too! And fumbles it quite badly.
-
-* As mentioned above, non-trivial views are basically useless. Queries like
- `SELECT some columns FROM a_view WHERE id = 53` are evaluated in the
- stupidest -- and slowest -- possible way. Good luck hiding unusual
- partitioning arrangements or a permissions check in a view if you want any
- kind of performance.
-* The poor interactions between triggers and binary logging's default
- configuration make it impractical to use triggers to maintain “materialized”
- views to avoid the problems with “real” views.
- * It also effectively means triggers can't be used to emulate `CHECK`
- constraints and other consistency features.
- * Code to maintain materialized views is also finicky and hard to get
- “right,” especially if the view includes aggregates or interesting joins
- over its source data. I hope you enjoy debugging MySQL's procedural
- SQL…
-* For the relatively common case of wanting to abstract partitioned storage
- away for clients, MySQL actually has [a
- tool](https://dev.mysql.com/doc/refman/5.5/en/partitioning.html) for it! But
- it comes with [enough caveats to strangle a
- horse](https://dev.mysql.com/doc/refman/5.5/en/partitioning-limitations.html):
- * It's a separate table engine wrapping a “real” storage engine, which
- means it has its own, separate support for engine-specific features:
- transactions, foreign keys, and index types, `AUTO_INCREMENT`, and
- others. The syntax for configuring partitions makes selecting the wrong
- underlying engine entirely too easy, too.
- * Partitioned tables may not be the referrent of foreign keys: you can't
- have both enforced relationships and this kind of storage management.
- * MySQL doesn't actually know how to store partitions on separate disks or
- filesystems. You still need to reach underneath of MySQL do to actual
- storage management.
- * Partitioning an InnoDB table under the default InnoDB configuration
- stores all of the partitions in the global tablespace file anyways.
- Helpful! For per-table configurations, they still all end up
- together in the same file. Partitioning InnoDB tables is a waste of
- time for managing storage.
- * TL,DR: MySQL's partition support is so finicky and limited that
- MySQL-based apps tend to opt for multiple MySQL servers (“sharding”)
- instead.
-
-### Hosting Logic In The Database
-
-Yeah, yeah, the usual reaction to stored procedures and in-DB code is “eww,
-yuck!” for some not-terrible reasons, but hear me out on two points:
-
-* Under the freestanding-database-server paradigm, there will usually be
- network latency between database clients and the database itself. There are
- two ways to minimize the impact of that: move the data to the code in bulk
- to minimize round-trips, or move the code to the data.
-* Some database administration tasks are better implemented using in-database
- code than as freestanding clients: complex data migrations that can't be
- expressed as freestanding SQL queries, for example.
-
-MySQL, as of version
-[5.0](https://dev.mysql.com/doc/relnotes/mysql/5.0/en/news-5-0-0.html)
-(released in 2003 -- remember that date, I'll come back to it), has support
-for in-database code via a procedural SQL-like dialect, like many other SQL
-databases. This includes server-side procedures (blocks of stored code that
-are invoked outside of any other statements and return statement-like
-results), functions (blocks of stored code that compute a result, used in any
-expression context such as a `SELECT` list or `WHERE` clause), and triggers
-(blocks of stored code that run whenever a row is created, modified, or
-deleted).
-
-Given the examples of
-[other](https://www.postgresql.org/docs/7.3/static/plpgsql.html)
-[contemporaneous](https://msdn.microsoft.com/en-US/library/ms189826(v=sql.90).aspx)
-[procedural](https://docs.oracle.com/cd/B10501_01/appdev.920/a96624/toc.htm)
-[languages](https://www.firebirdsql.org/file/documentation/reference_manuals/reference_material/html/langrefupd15-psql.html),
-MySQL's procedural dialect -- an implementation of the
-[SQL/PSM](https://en.wikipedia.org/wiki/SQL/PSM) language -- is quite limited:
-
-* There is no language construct for looping over a query result. This seems
- like a pretty fundamental feature for a database-hosted language, but no.
-* There is no language construct for looping while a condition holds. This
- seems like a pretty fundamental feature for an imperative language designed
- any time after about 1975, but no.
-* There is no language construct for looping over a range.
-* There is, in fact, one language construct for looping: the unconditional
- loop. All other iteration control is done via conditional `LEAVE`
- statements, as
-
- BEGIN
- DECLARE c CURSOR FOR
- SELECT foo, bar, baz
- FROM some_table
- WHERE some_condition;
- DECLARE done INT DEFAULT 0;
- DECLARE CONTINUE HANDLER FOR NOT FOUND
- SET done = 1;
-
- DECLARE c_foo INTEGER;
- DECLARE c_bar INTEGER;
- DECLARE c_baz INTEGER;
-
- OPEN c;
- process_some_table: LOOP
- FETCH c INTO c_foo, c_bar, c_baz;
- IF done THEN
- LEAVE process_some_table;
- END IF;
-
- -- do something with c_foo, c_bar, c_baz
- END LOOP;
- END;
-
- The original “structured programming” revolution in the 1960s seems to
- have passed the MySQL team by.
-
-* Okay, I lied. There are two looping constructs: there's also the `REPEAT ...
- UNTIL condition END REPEAT` construct, analogous to C's `do {} while
- (!condition);` loop. But you still can't loop over query results, and you
- can't run zero iterations of the loop's main body this way.
-* There is nothing resembling a modern exception system with automatic scoping
- of handlers or declarative exception management. Error handling is entirely
- via Visual Basic-style “on condition X, do Y” instructions, which remain in
- effect for the rest of the program's execution.
- * In the language shipped with MySQL 5.0, there wasn't a way to signal
- errors, either: programmers had to resort to stunts like [intentionally
- issuing failing
- queries](https://stackoverflow.com/questions/465727/raise-error-within-mysql-function),
- instead. Later versions of the language addressed this with the
- [`SIGNAL`
- statement](https://dev.mysql.com/doc/refman/5.5/en/signal.html): see,
- they _can_ learn from better languages, eventually.
-* You can't escape to some other language, since MySQL doesn't have an
- extension mechanism for server-side languages or a good way to call
- out-of-process services during queries.
-
-The net result is that developing MySQL stored programs is unpleasant,
-uncomfortable, and far more error-prone than it could have been.
-
-## Why Is MySQL The Way It Is? { #by-design }
-
-MySQL's technology and history contain the seeds of all of these flaws.
-
-### Pluggable Storage Engines
-
-Very early in MySQL's life, the MySQL dev team realized that MyISAM was not
-the only way to store data, and opted to support other storage backends within
-MySQL. This is basically an alright idea; while I personally prefer storage
-systems that focus their effort on making one backend work very well,
-supporting multiple backends and letting third-party developers write their
-own is a pretty good approach too.
-
-Unfortunately, MySQL's storage backend interface puts a very low ceiling on
-the ways storage backends can make MySQL behave better.
-
-MySQL's data access paths through table engines are very simple: MySQL asks
-the engine to open a table, asks the engine to iterate through the table
-returning rows, filters the rows itself (outside of the storage engine), then
-asks the engine to close the table. Alternately, MySQL asks the engine to open
-a table, asks the engine to retrieve rows in range or for a single value over
-a specific index, filters the rows itself, and asks the engine to close the
-table.
-
-This simplistic interface frees table engines from having to worry about query
-optimization - in theory. Unfortunately, engine-specific features have a large
-impact on the performance of various query plans, but the channels back to the
-query planner provide very little granularity for estimating cost and prevent
-the planner from making good use of the engine in unusual cases. Conversely,
-the table engine system is totally isolated from the actual query, and can't
-make query-dependent performance choices “on its own.” There's no third path;
-the query planner itself is not pluggable.
-
-Similar consequences apply to type checking, support for new types, or even
-something as “obvious” as multiple automatic `TIMESTAMP` columns in the same
-table.
-
-Table manipulation -- creation, structural modification, and so on -- runs
-into similar problems. MySQL itself parses each `CREATE TABLE` statement, then
-hands off a parsed representation to the table engine so that it can manage
-storage. The parsed representation is lossy: there are plenty of forms MySQL's
-parser recognizes that aren't representable in a `TABLE` structure, preventing
-engines from implementing, say, column or tuple `CHECK` constraints without
-MySQL's help.
-
-The [sheer number of table
-engines](https://dev.mysql.com/doc/refman/5.5/en/storage-engines.html) makes
-that help very slow in coming. Any change to the table engine interface means
-perturbing the code to each engine, making progress on new MySQL-level
-features that interact with storage such as better query planning or new SQL
-constructs necessarily slow to implement and slow to test.
-
-### Held Back By History
-
-The original MySQL team focused on pure read performance and on “ease of use”
-(for new users with simple needs, as far as I can tell) over correctness and
-completeness, violating Knuth's laws of optimization. Many of these decisions
-locked MySQL into behaviours very early in its life that it still displays
-now. Features like implicit type conversions legitimately do help streamline
-development in very simple cases; experience with [other
-languages](http://me.veekun.com/blog/2012/04/09/php-a-fractal-of-bad-design/)
-unfortunately shows that the same behaviours sandbag development and help hide
-bugs in more sophisticated scenarios.
-
-MySQL has since changed hands, and the teams working on MySQL (and MariaDB,
-and Percona) are much more mature now than the team that made those early
-decisions. MySQL's massive and frequently non-savvy userbase makes it very hard
-to introduce breaking changes. At the same time, adding _optional_ breaking
-changes via server and client mode flags (such as `sql_mode`) increases the
-cognitive overhead of understanding MySQL's behaviours -- especially when that
-behaviour can vary from client to client, or when the server's configuration is
-out of the user's control (for example, on a shared host, or on EC2).
-
-A solution similar to Python's `from __future__ import` pragmas for making
-breaking changes opt-in some releases in advance of making them mandatory
-might help, but MySQL doesn't have the kind of highly-invested, highly-skilled
-user base that would make that effective -- and it still has all of the
-problems of modal behaviour.
-
-## Bad Arguments
-
-Inevitably, someone's going to come along and tell me how wrong I am and how
-MySQL is just fine as a database system. These people are everywhere, and they
-mean well too, and they are almost all wrong. There are two good reasons to
-use MySQL:
-
-1. **Some earlier group wrote for it, and we haven't finished porting our code
- off of MySQL.**
-2. **We've considered all of these points, and many more, and decided that
- `___feature_x___` that MySQL offers is worth the hassle.**
-
-Unfortunately, these aren't the reasons people do give, generally. The
-following are much more common:
-
-* **It's good enough.** No it ain't. There are plenty of other equally-capable
- data storage systems that don't come with MySQL's huge raft of edge cases
- and quirks.
- * **We haven't run into these problems.** Actually, a lot of these
- problems happen _silently_. Odds are, unless you write your queries and
- schema statements with the manual open and refer back to it constantly,
- or have been using MySQL since the 3.x era _daily_, at least some of
- these issues have bitten you. The ones that prevent you from using your
- database intelligently are very hard to notice in action.
-* **We already know how to use it.** MySQL development and administration
- causes brain damage, folks, the same way PHP does. Where PHP teaches
- programmers that “array” is the only structure you need, MySQL teaches
- people that databases are awkward, slow, hard-to-tune monsters that require
- constant attention. That doesn't have to be true; there are comfortable,
- fast, and easily-tuned systems out there that don't require daily care and
- feeding or the love of a specialist.
-* **It's the only thing our host supports.** [Get](https://linode.com/) [a](https://www.heroku.com/) [better](https://gandi.net/) [host](https://www.engineyard.com). It's
- not like they're expensive or hard to find.
- * **We used it because it was there.** Please hire some fucking software
- developers and go back to writing elevator pitches and flirting with Y
- Combinator.
-* **Everybody knows MySQL. It's easy to hire MySQL folks.** It's easy to hire
- MCSEs, too, but you should be hiring for attitude and ability to learn, not
- for specific skillsets, if you want to run a successful software project.
- * **It's popular.** Sure, and nobody ever got fired for buying
- IBM/Microsoft/Adobe. Popularity isn't any indication of quality, and if
- we let popularity dictate what technology we use and improve we'll never
- get anywhere. Marketing software to geeks is _easy_ - it's just that
- lots of high-quality projects don't bother.
-* **It's lightweight.** So's [SQLite 3](https://www.sqlite.org) or
- [H2](https://www.h2database.com/html/main.html). If you care about deployment
- footprint more than any other factor, MySQL is actually pretty clunky (and
- embedded MySQL has even bigger problems than freestanding MySQL).
-* **It's getting better, so we might as well stay on it.** [It's
- true](https://dev.mysql.com/doc/refman/5.6/en/mysql-nutshell.html), if you go
- by feature checklists and the manual, MySQL is improving “rapidly.” 5.6 is
- due out soon and superficially looks to contain a number of good changes. I
- have two problems with this line of reasoning:
- 1. Why wait? Other databases are good _now_, not _eventually_.
- 2. MySQL has a history of providing the bare minimum to satisfy a feature
- checkbox without actually making the feature work well, work consistently,
- or work in combination with other features.