From e3ce4796498cf9995495f9ab767dfe4926b097bd Mon Sep 17 00:00:00 2001 From: Owen Jacobson Date: Mon, 21 Jan 2013 17:43:05 -0500 Subject: Started transcribing my notes on why MySQL is awful. --- wiki/mysql/choose-something-else.md | 53 +++++++++++++++++++++++++++++++++++++ 1 file changed, 53 insertions(+) create mode 100644 wiki/mysql/choose-something-else.md (limited to 'wiki') diff --git a/wiki/mysql/choose-something-else.md b/wiki/mysql/choose-something-else.md new file mode 100644 index 0000000..96a1d43 --- /dev/null +++ b/wiki/mysql/choose-something-else.md @@ -0,0 +1,53 @@ +# Considering MySQL? Choose Something Else + +Thesis: databases fill roles ranging between pure storage and extensive data +processing; MySQL is differently bad at both poles. + +(Real apps fall between these poles, and suffer variably from either set of +MySQL flaws.) + +## Storage + +Storage systems have four properties: + +1. Take and store data they receive from applications. +2. Keep that data safe against loss or accidental change. +3. Provide stored data to applications on demand. +4. Give administrators effective management tools. + +In a truly "pure" storage application, data-comprehension features +(constraints and relationships, nontrivial functions and aggregates) would go +totally unused. There is a time and a place for this: the return of "NoSQL" +storage systems attests to that. + +Pure storage systems tend to be closely coupled to their "main" application: +consider most web/server app databases. "Secondary" clients tend to be +read-only (reporting applications, monitoring) or to be utilities in service +of the main application (migration tools, documentation tools). If you believe +constraints, validity checks, and other comprehension features can be +implemented in "the application", you are probably thinking of databases close +to this pole. + +### Storing Data + +MySQL has many edge cases which reduce the predictability of its behaviour +when storing information. Most of these edge cases are documented, but violate +the principle of least surprise (not to mention the expectations of users +familiar with other SQL implementations). + +* Implicit conversions (particularly to and from string types) can modify + MySQL's behaviour. + * Many implicit conversions are also silent (no warning, no diagnostic), + by design, making it more likely developers are entirely unaware of + them until one does something surprising. +* Conversions that violate basic constraints (range, length) of the output + type often coerce data rather than failing. + * Sometimes this raises a warning; does your app check for those? + * This behaviour is unlike many typed systems (but closely like PHP and + remotely like Perl). +* Conversion behaviour depends on a per-connection configuration value with a + large constellation of possible states, making it harder to carry + expectations from manual testing over to code. + +### Preserving Data + -- cgit v1.2.3 From 5bf52597d6f19aea53894e975accfdf026e14c16 Mon Sep 17 00:00:00 2001 From: Owen Jacobson Date: Mon, 21 Jan 2013 18:12:57 -0500 Subject: Added some notes on why storing data in MySQL is unsafe. --- wiki/mysql/choose-something-else.md | 36 ++++++++++++++++++++++++++++++++++++ 1 file changed, 36 insertions(+) (limited to 'wiki') diff --git a/wiki/mysql/choose-something-else.md b/wiki/mysql/choose-something-else.md index 96a1d43..24b1fa1 100644 --- a/wiki/mysql/choose-something-else.md +++ b/wiki/mysql/choose-something-else.md @@ -6,6 +6,12 @@ processing; MySQL is differently bad at both poles. (Real apps fall between these poles, and suffer variably from either set of MySQL flaws.) +In the first section, I'll talk about [why MySQL is bad at storing +data](#storage). In the second, I'll talk about [why MySQL is bad at +processing data](#processing). In the third, I'll talk about why these +problems are inherent in the way MySQL was built and are not likely to be +fixed in the foreseeable future. + ## Storage Storage systems have four properties: @@ -51,3 +57,33 @@ familiar with other SQL implementations). ### Preserving Data +... against unexpected changes: like most disk-backed storage systems, MySQL +is as reliable as the disks and filesystems its data lives on. + +... against loss: hoo boy. MySQL's default approach to +[backups](http://dev.mysql.com/doc/refman/5.5/en/backup-methods.html) is to +either dump data to SQL periodically (slow and an inefficient use of space, +with little support for incremental backups) or to use binary logs (plus a +base filesystem backup) for ongoing backups. (Binary logs are also the basis +of MySQL's first-party replication system.) If neither of these are +sufficient, you're left with purchasing [a backup tool from +Oracle](http://dev.mysql.com/doc/refman/5.5/en/glossary.html#glos_mysql_enterprise_backup). + +Like many of MySQL's features, the binary logging feature is +[too](http://dev.mysql.com/doc/refman/5.5/en/binary-log.html) +[configurable](http://dev.mysql.com/doc/refman/5.5/en/replication-options-binary-log.html), +while still, somehow, defaulting to modes that are hazardous or surprising: +the +[default](http://dev.mysql.com/doc/refman/5.5/en/replication-options-binary-log.html#sysvar_binlog_format) +[behaviour](http://dev.mysql.com/doc/refman/5.5/en/replication-formats.html) +is to log SQL statements, rather than logging their side effects. This has +lead to numerous bugs over the years; MySQL now makes an effort to make common +"non-deterministic" cases such as `NOW()` and `RANDOM()` act deterministically +but these have been addressed using ad-hoc solutions. Restoring +binary-log-based backups can easily lead to data that differs from the +original system, and by the time you've noticed the problem, it's too late to +do anything about it. + +(Seriously. The binlog contains the current time on the master and the random +seed for every statement, just in case. If your non-deterministic query uses +any other function, you're still fucked by default.) \ No newline at end of file -- cgit v1.2.3 From 4a9e5931022d2375f8eaa9382d38b7da8426851a Mon Sep 17 00:00:00 2001 From: Owen Jacobson Date: Tue, 22 Jan 2013 17:05:08 -0500 Subject: Better title. --- wiki/mysql/choose-something-else.md | 6 +++++- 1 file changed, 5 insertions(+), 1 deletion(-) (limited to 'wiki') diff --git a/wiki/mysql/choose-something-else.md b/wiki/mysql/choose-something-else.md index 24b1fa1..a951e57 100644 --- a/wiki/mysql/choose-something-else.md +++ b/wiki/mysql/choose-something-else.md @@ -1,4 +1,8 @@ -# Considering MySQL? Choose Something Else +# Do Not Pass This Way Again + +Considering MySQL? Use something else. Already on MySQL? Migrate. For every +successful project built on MySQL, you could uncover a history of time wasted +mitigating MySQL's inadequacies. Thesis: databases fill roles ranging between pure storage and extensive data processing; MySQL is differently bad at both poles. -- cgit v1.2.3 From b39842a64b5b7718f4523f2e5abcf1fb1bc3fd58 Mon Sep 17 00:00:00 2001 From: Owen Jacobson Date: Tue, 22 Jan 2013 17:11:40 -0500 Subject: Clearer outline for backups. --- wiki/mysql/choose-something-else.md | 30 ++++++++++++++++++++---------- 1 file changed, 20 insertions(+), 10 deletions(-) (limited to 'wiki') diff --git a/wiki/mysql/choose-something-else.md b/wiki/mysql/choose-something-else.md index a951e57..3d043e3 100644 --- a/wiki/mysql/choose-something-else.md +++ b/wiki/mysql/choose-something-else.md @@ -62,16 +62,26 @@ familiar with other SQL implementations). ### Preserving Data ... against unexpected changes: like most disk-backed storage systems, MySQL -is as reliable as the disks and filesystems its data lives on. - -... against loss: hoo boy. MySQL's default approach to -[backups](http://dev.mysql.com/doc/refman/5.5/en/backup-methods.html) is to -either dump data to SQL periodically (slow and an inefficient use of space, -with little support for incremental backups) or to use binary logs (plus a -base filesystem backup) for ongoing backups. (Binary logs are also the basis -of MySQL's first-party replication system.) If neither of these are -sufficient, you're left with purchasing [a backup tool from -Oracle](http://dev.mysql.com/doc/refman/5.5/en/glossary.html#glos_mysql_enterprise_backup). +is as reliable as the disks and filesystems its data lives on. However, the +implicit conversion rules that bite when storing data also bite when asking +MySQL to modify data - my favourite example being a fat-fingered `UPDATE` +query where a mistyped `=` (as `-`, off by a single key) caused 90% of the +rows in the table to be affected, instead of one row, because of implicit +string-to-integer conversions. + +... against loss: hoo boy. MySQL, out of the box, gives you two approachesœ to +[backups](http://dev.mysql.com/doc/refman/5.5/en/backup-methods.html): + +* Dump to SQL with `mysqldump`: slow, relatively large backups, and + non-incremental, or +* Archive binary logs: fragile, complex, over-configurable, and configured + badly by default. (Binary logging is also the basis of MySQL's replication + system.) + +If neither of these are sufficient, you're left with purchasing [a backup tool +from +Oracle](http://dev.mysql.com/doc/refman/5.5/en/glossary.html#glos_mysql_enterprise_backup) +or from one of the third-party MySQL vendors. Like many of MySQL's features, the binary logging feature is [too](http://dev.mysql.com/doc/refman/5.5/en/binary-log.html) -- cgit v1.2.3 From f792fe876b00f912b3249de0469ebb0ac32cf944 Mon Sep 17 00:00:00 2001 From: Owen Jacobson Date: Wed, 23 Jan 2013 11:44:54 -0500 Subject: Completed the "storage in mysql sucks" draft. --- markdoc.yaml | 1 + wiki/mysql/choose-something-else.md | 138 ++++++++++++++++++++++++++++-------- 2 files changed, 108 insertions(+), 31 deletions(-) (limited to 'wiki') diff --git a/markdoc.yaml b/markdoc.yaml index 266e475..06800e5 100644 --- a/markdoc.yaml +++ b/markdoc.yaml @@ -3,3 +3,4 @@ wiki-name: "The Codex" markdown: extensions: - extra + - headerid diff --git a/wiki/mysql/choose-something-else.md b/wiki/mysql/choose-something-else.md index 3d043e3..212b8f7 100644 --- a/wiki/mysql/choose-something-else.md +++ b/wiki/mysql/choose-something-else.md @@ -2,19 +2,15 @@ Considering MySQL? Use something else. Already on MySQL? Migrate. For every successful project built on MySQL, you could uncover a history of time wasted -mitigating MySQL's inadequacies. +mitigating MySQL's inadequacies, masked by a hard-won, but meaningless, sense +of accomplishment over the effort spent making MySQL behave. -Thesis: databases fill roles ranging between pure storage and extensive data -processing; MySQL is differently bad at both poles. +Thesis: databases fill roles ranging from pure storage to complex and +interesting data processing; MySQL is differently bad at both tasks. Real apps +all fall somewhere between these poles, and suffer variably from both sets of +MySQL flaws. -(Real apps fall between these poles, and suffer variably from either set of -MySQL flaws.) - -In the first section, I'll talk about [why MySQL is bad at storing -data](#storage). In the second, I'll talk about [why MySQL is bad at -processing data](#processing). In the third, I'll talk about why these -problems are inherent in the way MySQL was built and are not likely to be -fixed in the foreseeable future. +* MySQL is bad at [storage](#storage). ## Storage @@ -55,25 +51,43 @@ familiar with other SQL implementations). * Sometimes this raises a warning; does your app check for those? * This behaviour is unlike many typed systems (but closely like PHP and remotely like Perl). -* Conversion behaviour depends on a per-connection configuration value with a - large constellation of possible states, making it harder to carry - expectations from manual testing over to code. +* Conversion behaviour depends on a per-connection configuration value + (`sql_mode`) that has [a large constellation of possible + states](http://dev.mysql.com/doc/refman/5.5/en/server-sql-mode.html), making + it harder to carry expectations from manual testing over to code or from + tool to tool. +* MySQL uses non-standard and rather unique interpretations of several common + character encodings, including UTF-8 and Latin-1. Implementation details of + these encodings within MySQL, such as the `utf8` encoding's MySQL-specific + 3-byte limit, tend to leak out into client applications. Data that does not + fit MySQL's understanding of the storage encoding will be transformed until + it does, by truncation or replacement, by default. ### Preserving Data ... against unexpected changes: like most disk-backed storage systems, MySQL -is as reliable as the disks and filesystems its data lives on. However, the -implicit conversion rules that bite when storing data also bite when asking -MySQL to modify data - my favourite example being a fat-fingered `UPDATE` -query where a mistyped `=` (as `-`, off by a single key) caused 90% of the -rows in the table to be affected, instead of one row, because of implicit -string-to-integer conversions. +is as reliable as the disks and filesystems its data lives on. MySQL makes +very little effort to do its own storage validation and error correction, but +this is a limitation shared with many, _many_ other systems. + +The implicit conversion rules that bite when storing data also bite when +asking MySQL to modify data - my favourite example being a fat-fingered +`UPDATE` query where a mistyped `=` (as `-`, off by a single key) caused 90% +of the rows in the table to be affected, instead of one row, because of +implicit string-to-integer conversions. ... against loss: hoo boy. MySQL, out of the box, gives you two approachesœ to [backups](http://dev.mysql.com/doc/refman/5.5/en/backup-methods.html): +* Take "blind" filesystem backups with `tar` or `rsync`. Unless you + meticulously lock tables or make the database read-only for the duration, + this produces a backup that requires crash recovery before it will be + usable, and can produce an inconsistent database. + * This can bite quite hard if you use InnoDB, as InnoDB crash recovery + takes time proportional to both the number of InnoDB tables and the + total size of InnoDB tables, with a large constant. * Dump to SQL with `mysqldump`: slow, relatively large backups, and - non-incremental, or + non-incremental. * Archive binary logs: fragile, complex, over-configurable, and configured badly by default. (Binary logging is also the basis of MySQL's replication system.) @@ -91,13 +105,75 @@ the [default](http://dev.mysql.com/doc/refman/5.5/en/replication-options-binary-log.html#sysvar_binlog_format) [behaviour](http://dev.mysql.com/doc/refman/5.5/en/replication-formats.html) is to log SQL statements, rather than logging their side effects. This has -lead to numerous bugs over the years; MySQL now makes an effort to make common -"non-deterministic" cases such as `NOW()` and `RANDOM()` act deterministically -but these have been addressed using ad-hoc solutions. Restoring -binary-log-based backups can easily lead to data that differs from the -original system, and by the time you've noticed the problem, it's too late to -do anything about it. - -(Seriously. The binlog contains the current time on the master and the random -seed for every statement, just in case. If your non-deterministic query uses -any other function, you're still fucked by default.) \ No newline at end of file +lead to numerous bugs over the years; MySQL (now) makes an effort to make +common "non-deterministic" cases such as `NOW()` and `RANDOM()` act +deterministically but these have been addressed using ad-hoc solutions. +Restoring binary-log-based backups can easily lead to data that differs from +the original system, and by the time you've noticed the problem, it's too late +to do anything about it. + +(Seriously. The binary log entries for each statement contain the "current" +time on the master and the random seed at the start of the statement, just in +case. If your non-deterministic query uses any other function, you're still +[fucked by +default](http://dev.mysql.com/doc/refman/5.5/en/replication-sbr-rbr.html#replication-sbr-rbr-sbr-disadvantages).) + +### Retrieving Data + +This mostly works as expected. Most of the ways MySQL will screw you happen +when you store data, not when you retrieve it. However, there are a few things +that implicitly transform stored data before returning it: + +* `TIMESTAMP` columns (and _only_ `TIMESTAMP` columns) can return + apparently-differing values for the same stored value depending on + per-connection configuration even during read-only operation. This is done + silently and the default behaviour can change as a side effect of non-MySQL + configuration changes in the underlying OS. +* String-typed columns are transformed for encoding on output if the + connection is not using the same encoding as the underlying storage, using + the same rules as the transformation on input. +* Values that stricter `sql_mode` settings would reject during storage can + still be returned during retrieval; it is impossible to predict in advance + whether such data exists, since clients are free to set `sql_mode` to any + value at any time. + +### Efficiency + +For purely store-and-retrieve applications, MySQL's query planner (which +transforms the miniature program contained in each SQL statement into a tree +of disk access and data manipulation steps) is sufficient, but only barely. +Queries that retrieve data from one table, or from one table and a small +number of one-to-maybe-one related tables, produce relatively efficient plans. + +MySQL, however, offers a number of tuning options that can have dramatic and +counterintuitive effects, and the documentation provides very little advice +for choosing settings. Tuning relies on the administrator's personal +experience, blog articles of varying quality, and consultants. + +* The MySQL query cache defaults to a non-zero size in some commonly-installed + configurations. However, the larger the cache, the slower writes proceed: + invalidating cache entries that include the tables modified by a query means + considering every entry in the cache. This cache also uses MySQL's LRU + implementation, which has its own performance problems during eviction that + get worse with larger cache sizes. +* Memory-management settings, including `key_buffer_size` and + `innodb_buffer_pool_size`, have non-linear relationships with performance. + The + [standard](http://www.mysqlperformanceblog.com/2006/09/29/what-to-tune-in-mysql-server-after-installation/) + [advice](http://www.mysqlperformanceblog.com/2007/11/01/innodb-performance-optimization-basics/) + advises making whichever value you care about more to a large value, but + this can be counterproductive if the related data is larger than the pool + can hold: MySQL is once again bad at discarding old buffer pages when the + buffer is exhausted, leading to dramatic slowdowns when query load reaches a + certain point. + * This also affects filesystem tuning settings such as `table_open_cache`. +* InnoDB, out of the box, comes configured to use one large (and automatically + growing) tablespace file for all tables, complicating backups and storage + management. This is fine for trivial databases, but MySQL provides no tools + (aside from `DROP TABLE` and reloading the data from an SQL dump) for + transplanting a table to another tablespace, and provides no tools (aside + from a filesystem-level `rm`, and reloading _all_ InnoDB data from an SQL + dump) for reclaiming empty space in a tablespace file. +* MySQL itself provides very few tools to manage storage; tasks like storing + large or infrequently-accessed tables and databases on dedicated filesystems + must be done on the filesystem, with MySQL shut down. -- cgit v1.2.3