From f82d259e7bda843fb63ac1a0f6ff1d6bfb187099 Mon Sep 17 00:00:00 2001 From: Owen Jacobson Date: Wed, 9 Dec 2015 20:40:42 -0500 Subject: Remove HTML from the project. (We're no longer using Dokku.) --- .html/mysql/choose-something-else.html | 836 --------------------------------- 1 file changed, 836 deletions(-) delete mode 100644 .html/mysql/choose-something-else.html (limited to '.html/mysql/choose-something-else.html') diff --git a/.html/mysql/choose-something-else.html b/.html/mysql/choose-something-else.html deleted file mode 100644 index ca3a7b2..0000000 --- a/.html/mysql/choose-something-else.html +++ /dev/null @@ -1,836 +0,0 @@ - - - - - The Codex » - Do Not Pass This Way Again - - - - - - - - -
- - - - - -
-

Do Not Pass This Way Again

-

Considering MySQL? Use something else. Already on MySQL? Migrate. For every -successful project built on MySQL, you could uncover a history of time wasted -mitigating MySQL's inadequacies, masked by a hard-won, but meaningless, sense -of accomplishment over the effort spent making MySQL behave.

-

Thesis: databases fill roles ranging from pure storage to complex and -interesting data processing; MySQL is differently bad at both tasks. Real apps -all fall somewhere between these poles, and suffer variably from both sets of -MySQL flaws.

- -

Much of this is inspired by the principles behind PHP: A Fractal of Bad -Design. I -suggest reading that article too -- it's got a lot of good thought in it even -if you already know to stay well away from PHP. (If that article offends you, -well, this page probably will too.)

-

Storage

-

Storage systems have four properties:

-
    -
  1. Take and store data they receive from applications.
  2. -
  3. Keep that data safe against loss or accidental change.
  4. -
  5. Provide stored data to applications on demand.
  6. -
  7. Give administrators effective management tools.
  8. -
-

In a truly “pure” storage application, data-comprehension features -(constraints and relationships, nontrivial functions and aggregates) would go -totally unused. There is a time and a place for this: the return of “NoSQL” -storage systems attests to that.

-

Pure storage systems tend to be closely coupled to their “main” application: -consider most web/server app databases. “Secondary” clients tend to be -read-only (reporting applications, monitoring) or to be utilities in service -of the main application (migration tools, documentation tools). If you believe -constraints, validity checks, and other comprehension features can be -implemented in “the application,” you are probably thinking of databases close -to this pole.

-

Storing Data

-

MySQL has many edge cases which reduce the predictability of its behaviour -when storing information. Most of these edge cases are documented, but violate -the principle of least surprise (not to mention the expectations of users -familiar with other SQL implementations).

-
    -
  • Implicit conversions (particularly to and from string types) can modify - MySQL's behaviour.
      -
    • Many implicit conversions are also silent (no warning, no diagnostic), - by design, making it more likely developers are entirely unaware of - them until one does something surprising.
    • -
    -
  • -
  • Conversions that violate basic constraints (range, length) of the output - type often coerce data rather than failing.
      -
    • Sometimes this raises a warning; does your app check for those?
    • -
    • This behaviour is unlike many typed systems (but closely like PHP and - remotely like Perl).
    • -
    -
  • -
  • Conversion behaviour depends on a per-connection configuration value - (sql_mode) that has a large constellation of possible - states, making - it harder to carry expectations from manual testing over to code or from - tool to tool.
  • -
  • MySQL recommends UTF-8 as a character-set, but still defaults to Latin-1. - The implimentation of utf8 up until MySQL 5.5 was only the 3-byte - BMP. - MySQL 5.5 and beyond supports a 4-byte utf8, but confusingly must be set - with the character-set utf8mb4. Implementation details of these encodings - within MySQL, such as the utf8 3-byte limit, tend to leak out into client - applications. Data that does not fit MySQL's understanding of the storage - encoding will be transformed until it does, by truncation or replacement, by - default.
      -
    • Collation support is per-encoding, with one of the stranger default - configurations: by default, the collation orders characters according to - Swedish alphabetization rules, case-insensitively.
    • -
    • Since it's the default, lots of folks who don't know the manual - inside-out and backwards observe MySQL's case-insensitive collation - behaviour ('a' = 'A') and conclude that “MySQL is case-insensitive,” - complicating any effort to use a case-sensitive locale.
    • -
    • Both the encoding and the collation can vary, independently, by - column. Do you keep your schema definition open when you write - queries to watch out for this sort of shit?
    • -
    -
  • -
  • The TIMESTAMP type tries to do something smart by storing values in a - canonical timezone (UTC), but it's done with so few affordances that it's - very hard to even tell that MySQL's done a right thing with your data.
      -
    • And even after that, the result of foo < '2012-04-01 09:00:00' still - depends on what time of year it is when you evaluate the query, unless - you're very careful with your connection timezone.
    • -
    • TIMESTAMP is also special-cased in MySQL's schema definition handling, - making it easy to accidentally create (or to accidentally fail to - create) an auto-updating field when you didn't (did) want one.
    • -
    • DATETIME does not get the same timezone handling TIMESTAMP does. - What? And you can't provide your own without resorting to hacks like - extra columns.
    • -
    • Oh, did you want to use MySQL's timezone support? Too bad, none of - that data's loaded by default. You have to process the OS's tzinfo - files into SQL with a separate tool and import that. If you ever want to - update MySQL's timezone settings later, you need to take the server down - just to make sure the changes apply.
    • -
    -
  • -
-

Preserving Data

-

... against unexpected changes: like most disk-backed storage systems, MySQL -is as reliable as the disks and filesystems its data lives on. MySQL provides -no additional functionality in terms of mirroring or hardware failure tolerance -(such as Oracle ASM). -However this is a limitation shared with many, many other systems.

-

When using the InnoDB storage engine (default since MySQL 5.5), MySQL maintains page -checksums in order to detect corruption caused by underlying storage. However, -many third-party software applications, as sell as users upgrading -from earlier versions of MySQL may be using MyISAM, which will frequently corrupt -data files on improper shutdown.

-

The implicit conversion rules that bite when storing data also bite when -asking MySQL to modify data - my favourite example being a fat-fingered -UPDATE query where a mistyped = (as -, off by a single key) caused 90% -of the rows in the table to be affected, instead of one row, because of -implicit string-to-integer conversions.

-

... against loss: hoo boy. MySQL, out of the box, gives you three approaches -to backups:

-
    -
  • Take “blind” filesystem backups with tar or rsync. Unless you - meticulously lock tables or make the database read-only for the duration, - this produces a backup that requires crash recovery before it will be - usable, and can produce an inconsistent database.
      -
    • This can bite quite hard if you use InnoDB, as InnoDB crash recovery - takes time proportional to both the number of InnoDB tables and the - total size of InnoDB tables, with a large constant.
    • -
    -
  • -
  • Dump to SQL with mysqldump: slow, relatively large backups, and - non-incremental.
  • -
  • Archive binary logs: fragile, complex, over-configurable, and configured - badly by default. (Binary logging is also the basis of MySQL's replication - system.)
  • -
-

If neither of these are sufficient, you're left with purchasing a backup tool -from -Oracle -or from one of the third-party MySQL vendors.

-

Like many of MySQL's features, the binary logging feature is -too -configurable, -while still, somehow, defaulting to modes that are hazardous or surprising: -the -default -behaviour -is to log SQL statements, rather than logging their side effects. This has -lead to numerous bugs over the years; MySQL (now) makes an effort to make -common “non-deterministic” cases such as NOW() and RANDOM() act -deterministically but these have been addressed using ad-hoc solutions. -Restoring binary-log-based backups can easily lead to data that differs from -the original system, and by the time you've noticed the problem, it's too late -to do anything about it.

-

(Seriously. The binary log entries for each statement contain the “current” -time on the master and the random seed at the start of the statement, just in -case. If your non-deterministic query uses any other function, you're still -fucked by -default.)

-

Additionally, a number of apparently-harmless features can lead to backups or -replicas wandering out of sync with the original database, in the default -configuration:

-
    -
  • AUTO_INCREMENT and UPDATE statements.
  • -
  • AUTO_INCREMENT and INSERT statements (sometimes). SURPRISE.
  • -
  • Triggers.
  • -
  • User-defined (native) functions.
  • -
  • Stored (procedural SQL) functions.
  • -
  • DELETE ... LIMIT and UPDATE ... LIMIT statements, though if you use - these, you've misunderstood how SQL is supposed to work.
  • -
  • INSERT ... ON DUPLICATE KEY UPDATE statements.
  • -
  • Bulk-loading data with LOAD DATA statements.
  • -
  • Operations on floating-point - values.
  • -
-

Retrieving Data

-

This mostly works as expected. Most of the ways MySQL will screw you happen -when you store data, not when you retrieve it. However, there are a few things -that implicitly transform stored data before returning it:

-
    -
  • -

    MySQL's surreal type conversion system works the same way during SELECT - that it works during other operations, which can lead to queries matching - unexpected rows:

    -
    owen@scratch> CREATE TABLE account (
    -    ->     accountid INTEGER
    -    ->         AUTO_INCREMENT
    -    ->         PRIMARY KEY,
    -    ->     discountid INTEGER
    -    -> );
    -Query OK, 0 rows affected (0.54 sec)
    -
    -owen@scratch> INSERT INTO account
    -    ->     (discountid)
    -    -> VALUES
    -    ->     (0),
    -    ->     (1),
    -    ->     (2);
    -Query OK, 3 rows affected (0.03 sec)
    -Records: 3  Duplicates: 0  Warnings: 0
    -
    -owen@scratch> SELECT *
    -    -> FROM account
    -    -> WHERE discountid = 'banana';
    -+-----------+------------+
    -| accountid | discountid |
    -+-----------+------------+
    -|         1 |          0 |
    -+-----------+------------+
    -1 row in set, 1 warning (0.05 sec)
    -
    -

    Ok, unexpected, but there's at least a warning (do your apps check for -those?) - let's see what it says:

    -
    owen@scratch> SHOW WARNINGS;
    -+---------+------+--------------------------------------------+
    -| Level   | Code | Message                                    |
    -+---------+------+--------------------------------------------+
    -| Warning | 1292 | Truncated incorrect DOUBLE value: 'banana' |
    -+---------+------+--------------------------------------------+
    -1 row in set (0.03 sec)
    -
    -

    I can count on one hand the number of DOUBLE columns in this example and -still have five fingers left over.

    -

    You might think this is an unreasonable example: maybe you should always -make sure your argument types exactly match the field types, and the query -should use 57 instead of 'banana'. (This does actually “fix” the -problem.) It's unrealistic to expect every single user to run SHOW CREATE -TABLE before every single query, or to memorize the types of every column -in your schema, though. This example derived from a technically-skilled -but MySQL-ignorant tester examining MySQL data to verify some behavioural -changes in an app.

    -
      -
    • -

      Actually, you don't even need a table for this: SELECT 0 = 'banana' - returns 1. Did the PHP folks design - MySQL's = operator?

      -
    • -
    • -

      This isn't affected by sql_mode, even though so many other things are.

      -
    • -
    -
  • -
  • -

    TIMESTAMP columns (and only TIMESTAMP columns) can return - apparently-differing values for the same stored value depending on - per-connection configuration even during read-only operation. This is done - silently and the default behaviour can change as a side effect of non-MySQL - configuration changes in the underlying OS.

    -
  • -
  • String-typed columns are transformed for encoding on output if the - connection is not using the same encoding as the underlying storage, using - the same rules as the transformation on input.
  • -
  • Values that stricter sql_mode settings would reject during storage can - still be returned during retrieval; it is impossible to predict in advance - whether such data exists, since clients are free to set sql_mode to any - value at any time.
  • -
-

Efficiency

-

For purely store-and-retrieve applications, MySQL's query planner (which -transforms the miniature program contained in each SQL statement into a tree -of disk access and data manipulation steps) is sufficient, but only barely. -Queries that retrieve data from one table, or from one table and a small -number of one-to-maybe-one related tables, produce relatively efficient plans.

-

MySQL, however, offers a number of tuning options that can have dramatic and -counterintuitive effects, and the documentation provides very little advice -for choosing settings. Tuning relies on the administrator's personal -experience, blog articles of varying quality, and consultants.

-
    -
  • The MySQL query cache defaults to a non-zero size in some commonly-installed - configurations. However, the larger the cache, the slower writes proceed: - invalidating cache entries that include the tables modified by a query means - considering every entry in the cache. This cache also uses MySQL's LRU - implementation, which has its own performance problems during eviction that - get worse with larger cache sizes.
  • -
  • Memory-management settings, including key_buffer_size and innodb_buffer_pool_size, - have non-linear relationships with performance. The standard - advice advises - making whichever value you care about more to a large value, but this can be - counterproductive if the related data is larger than the pool can hold: - MySQL is once again bad at discarding old buffer pages when the buffer is - exhausted, leading to dramatic slowdowns when query load reaches a certain - point.
      -
    • This also affects filesystem tuning settings such as table_open_cache.
    • -
    -
  • -
  • InnoDB, out of the box, comes configured to use one large (and automatically - growing) tablespace file for all tables, complicating backups and storage - management. This is fine for trivial databases, but MySQL provides no tools - (aside from DROP TABLE and reloading the data from an SQL dump) for - transplanting a table to another tablespace, and provides no tools (aside - from a filesystem-level rm, and reloading all InnoDB data from an SQL - dump) for reclaiming empty space in a tablespace file.
  • -
  • MySQL itself provides very few tools to manage storage; tasks like storing - large or infrequently-accessed tables and databases on dedicated filesystems - must be done on the filesystem, with MySQL shut down.
  • -
-

Data Processing

-

Data processing encompasses tasks that require making decisions about data and -tasks that derive new data from existing data. This is a huge range of topics:

-
    -
  • Deciding (and enforcing) application-specific validity rules.
  • -
  • Summarizing and deriving data.
  • -
  • Providing and maintaining alternate representations and structures.
  • -
  • Hosting complex domain logic near the data it operates on.
  • -
-

The further towards data processing tasks applications move, the more their -SQL resembles tiny programs sent to the data. MySQL is totally unprepared for -programs, and expects SQL to retrieve or modify simple rows.

-

Validity

-

Good constraints are like asserts: in an ideal world, you can't tell if they -work, because your code never violates them. Here in the real world, -constraint violations happen for all sorts of reasons, ranging from buggy code -to buggy human cognition. A good database gives you more places to describe -your expectations and more tools for detecting and preventing surprises. -MySQL, on the other hand, can't validate your data for you, beyond simple (and -fixed) type constraints:

-
    -
  • -

    As with the data you store in it, MySQL feels free to change your table - definitions implicitly and - silently. - Many of these silent schema changes have important performance and - feature-availability implications.

    -
      -
    • -

      Foreign keys are ignored if you spell them certain, common, ways:

      -
      CREATE TABLE foo (
      -    -- ...,
      -    parent INTEGER
      -        NOT NULL
      -        REFERENCES foo_parent (id)
      -    -- , ...
      -)
      -
      -

      silently ignores the foreign key specification, while

      -
      CREATE TABLE foo (
      -    -- ...,
      -    parent INTEGER
      -        NOT NULL,
      -    FOREIGN KEY (parent)
      -        REFERENCES foo_parent (id)
      -    -- , ...
      -)
      -
      -

      preserves it.

      -
    • -
    -
  • -
  • -

    Foreign keys, one of the most widely-used database validity checks, are an - engine-specific feature, restricting their availability in combination with - other engine-specific features. (For example, a table cannot have both - foreign key constraints and full-text indexes, as of MySQL 5.5.)

    -
      -
    • Configurations that violate assumptions about foreign keys, such as a - foreign key pointing into a MyISAM or NDB table, do not cause warnings - or any other diagnostics. The foreign key is simply discarded. SURPRISE. - (MySQL is riddled with these sorts of surprises, and apologists lean - very heavily on the “that's documented” excuse for its bad behaviour.)
    • -
    -
  • -
  • The MySQL parser recognizes CHECK clauses, which allow schema developers - to make complex declarative assertions about tuples in the database, but - discards them without - warning. If you - want CHECK-like constraints, you must implement them as triggers - but see - below...
  • -
  • MySQL's comprehension of the DEFAULT clause is, uh, limited: only - constants are permitted, except for the special - case - of at most one TIMESTAMP column per table and at most one sequence-derived - column. Who designed this mess?
      -
    • Furthermore, there's no way to say “no default” and raise an error when - an INSERT forgets to provide a value. The default DEFAULT is either - NULL or a zero-like constant (0, '', and so on). Even for types - with no meaningful zero-like values (DATETIME).
    • -
    -
  • -
  • MySQL has no mechanism for introducing new types, which might otherwise - provide a route to enforcing validity. Counting the number of special cases - in MySQL's existing type - system illustrates - why that's probably unfixable.
  • -
-

I hope every client with write access to your data is absolutely perfect, -because MySQL cannot help you if you make a mistake.

-

Summarizing and Deriving Data

-

SQL databases generally provide features for doing “interesting” things with -sets of tuples, and MySQL is no exception. However, MySQL's limitations mean -that actually processing data in the database is fraught with wasted money, -brains, and time:

-
    -
  • Aggregate (GROUP BY) queries run up against limits in MySQL's query - planner: a query with both WHERE and GROUP BY clauses can only satisfy - one constraint or the other with indexes, unless there's an index that - covers all the relevant fields in both clauses, in the right order. (What - this order is depends on the complexity of the query and on the distribution - of the underlying data, but that's hardly MySQL-specific.)
      -
    • If you have all three of WHERE, GROUP BY, and ORDER BY in the same - query, you're more or less fucked. Good luck designing a single index - that satisfies all three.
    • -
    -
  • -
  • Even though MySQL allows database administrators to define normal functions - in a procedural SQL - dialect, - custom aggregate - functions - can only be defined by native plugins. Good thing, too, because procedural - SQL in MySQL is its own kind of awful - more on that below.
  • -
  • Subqueries are often convenient and occasionally necessary for expressing - multi-step transformations on some underlying data. MySQL's query planner - has only one strategy for optimizing them: evaluate the innermost query as - written, into an in-memory table, then use a nested loop to satisfy joins or - IN clauses. For large subquery results or interestingly nested subqueries, - this is absurdly slow.
      -
    • MySQL's query planner can't fold constraints from outer queries into - subqueries.
    • -
    • The generated in-memory table never has any indexes, ever, even when - appropriate indexes are “obvious” from the surrounding query; you cannot - even specify them.
    • -
    • These limitations also affect views, which are evaluated as if they were - subqueries. In combination with the lack of constraint folding in the - planner, this makes filtering or aggregating over large views completely - impractical.
    • -
    • MySQL lacks common table - expressions. - Even if subquery efficiency problems get fixed, the inability to give - meaningful names to subqueries makes them hard to read and comprehend.
    • -
    • I hope you like CREATE TEMPORARY TABLE AS SELECT, because that's your - only real alternative.
    • -
    -
  • -
  • Window - functions do not - exist at all in MySQL. This complicates many kinds of analysis, including - time series analyses and ranking analyses. -
  • -
  • Even interesting joins run into trouble. MySQL's query planner has trouble - with a number of cases that can easily arise in well-normalized data:
      -
    • Joining and ordering by rows from multiple tables often forces MySQL to - dump the whole join to a temporary table, then sort it -- awful, - especially if you then use LIMIT BY to paginate the results.
    • -
    • JOIN clauses with non-trivial conditions, such as joins by range or - joins by similarity, generally cause the planner to revert to table - scans even if the same condition would be indexable outside of a join.
    • -
    • Joins with WHERE clauses that span both tables, where the rows - selected by the WHERE clause are outliers relative to the table - statistics, often cause MySQL to access tables in suboptimal order.
    • -
    -
  • -
  • Ok, forget about interesting joins. Even interesting WHERE clauses can run - into trouble: MySQL can't index deterministic functions of a row, either. - While some deterministic functions can be eliminated from the WHERE clause - using simple algebra, many useful cases (whitespace-insensitive comparison, - hash-based comparisons, and so on) can't.
      -
    • You can fake these by storing the computed value in the row alongside - the “real” value. This leaves your schema with some ugly data repetition - and a chance for the two to fall out of sync, and clients must use the - “computed” column explicitly.
    • -
    • Oh, and they must maintain the “computed” version explicitly.
    • -
    • Or you can use triggers. Ha. See above.
    • -
    -
  • -
-

And now you know why MySQL advocates are such big fans of doing data -processing in “the client” or “the app.”

-

Alternate Representations and Derived Tables

-

Many databases let schema designers and administrators abstract the underlying -“physical” table structure from the presentation given to clients, or to some -specific clients, for any of a number of reasons. MySQL tries to let you do -this, too! And fumbles it quite badly.

-
    -
  • As mentioned above, non-trivial views are basically useless. Queries like - SELECT some columns FROM a_view WHERE id = 53 are evaluated in the - stupidest -- and slowest -- possible way. Good luck hiding unusual - partitioning arrangements or a permissions check in a view if you want any - kind of performance.
  • -
  • The poor interactions between triggers and binary logging's default - configuration make it impractical to use triggers to maintain “materialized” - views to avoid the problems with “real” views.
      -
    • It also effectively means triggers can't be used to emulate CHECK - constraints and other consistency features.
    • -
    • Code to maintain materialized views is also finicky and hard to get - “right,” especially if the view includes aggregates or interesting joins - over its source data. I hope you enjoy debugging MySQL's procedural - SQL…
    • -
    -
  • -
  • For the relatively common case of wanting to abstract partitioned storage - away for clients, MySQL actually has a - tool for it! But - it comes with enough caveats to strangle a - horse:
      -
    • It's a separate table engine wrapping a “real” storage engine, which - means it has its own, separate support for engine-specific features: - transactions, foreign keys, and index types, AUTO_INCREMENT, and - others. The syntax for configuring partitions makes selecting the wrong - underlying engine entirely too easy, too.
    • -
    • Partitioned tables may not be the referrent of foreign keys: you can't - have both enforced relationships and this kind of storage management.
    • -
    • MySQL doesn't actually know how to store partitions on separate disks or - filesystems. You still need to reach underneath of MySQL do to actual - storage management.
        -
      • Partitioning an InnoDB table under the default InnoDB configuration - stores all of the partitions in the global tablespace file anyways. - Helpful! For per-table configurations, they still all end up - together in the same file. Partitioning InnoDB tables is a waste of - time for managing storage.
      • -
      -
    • -
    • TL,DR: MySQL's partition support is so finicky and limited that - MySQL-based apps tend to opt for multiple MySQL servers (“sharding”) - instead.
    • -
    -
  • -
-

Hosting Logic In The Database

-

Yeah, yeah, the usual reaction to stored procedures and in-DB code is “eww, -yuck!” for some not-terrible reasons, but hear me out on two points:

-
    -
  • Under the freestanding-database-server paradigm, there will usually be - network latency between database clients and the database itself. There are - two ways to minimize the impact of that: move the data to the code in bulk - to minimize round-trips, or move the code to the data.
  • -
  • Some database administration tasks are better implemented using in-database - code than as freestanding clients: complex data migrations that can't be - expressed as freestanding SQL queries, for example.
  • -
-

MySQL, as of version -5.0 -(released in 2003 -- remember that date, I'll come back to it), has support -for in-database code via a procedural SQL-like dialect, like many other SQL -databases. This includes server-side procedures (blocks of stored code that -are invoked outside of any other statements and return statement-like -results), functions (blocks of stored code that compute a result, used in any -expression context such as a SELECT list or WHERE clause), and triggers -(blocks of stored code that run whenever a row is created, modified, or -deleted).

-

Given the examples of -other -contemporaneous -procedural -languages, -MySQL's procedural dialect -- an implementation of the -SQL/PSM language -- is quite limited:

-
    -
  • There is no language construct for looping over a query result. This seems - like a pretty fundamental feature for a database-hosted language, but no.
  • -
  • There is no language construct for looping while a condition holds. This - seems like a pretty fundamental feature for an imperative language designed - any time after about 1975, but no.
  • -
  • There is no language construct for looping over a range.
  • -
  • -

    There is, in fact, one language construct for looping: the unconditional - loop. All other iteration control is done via conditional LEAVE - statements, as

    -
    BEGIN
    -    DECLARE c CURSOR FOR
    -        SELECT foo, bar, baz
    -        FROM some_table
    -        WHERE some_condition;
    -    DECLARE done INT DEFAULT 0;
    -    DECLARE CONTINUE HANDLER FOR NOT FOUND
    -        SET done = 1;
    -
    -    DECLARE c_foo INTEGER;
    -    DECLARE c_bar INTEGER;
    -    DECLARE c_baz INTEGER;
    -
    -    OPEN c;
    -    process_some_table: LOOP
    -        FETCH c INTO c_foo, c_bar, c_baz;
    -        IF done THEN
    -            LEAVE process_some_table;
    -        END IF;
    -
    -        -- do something with c_foo, c_bar, c_baz
    -    END LOOP;
    -END;
    -
    -

    The original “structured programming” revolution in the 1960s seems to -have passed the MySQL team by.

    -
  • -
  • -

    Okay, I lied. There are two looping constructs: there's also the REPEAT ... - UNTIL condition END REPEAT construct, analogous to C's do {} while - (!condition); loop. But you still can't loop over query results, and you - can't run zero iterations of the loop's main body this way.

    -
  • -
  • There is nothing resembling a modern exception system with automatic scoping - of handlers or declarative exception management. Error handling is entirely - via Visual Basic-style “on condition X, do Y” instructions, which remain in - effect for the rest of the program's execution.
      -
    • In the language shipped with MySQL 5.0, there wasn't a way to signal - errors, either: programmers had to resort to stunts like intentionally - issuing failing - queries, - instead. Later versions of the language addressed this with the - SIGNAL - statement: see, - they can learn from better languages, eventually.
    • -
    -
  • -
  • You can't escape to some other language, since MySQL doesn't have an - extension mechanism for server-side languages or a good way to call - out-of-process services during queries.
  • -
-

The net result is that developing MySQL stored programs is unpleasant, -uncomfortable, and far more error-prone than it could have been.

-

Why Is MySQL The Way It Is?

-

MySQL's technology and history contain the seeds of all of these flaws.

-

Pluggable Storage Engines

-

Very early in MySQL's life, the MySQL dev team realized that MyISAM was not -the only way to store data, and opted to support other storage backends within -MySQL. This is basically an alright idea; while I personally prefer storage -systems that focus their effort on making one backend work very well, -supporting multiple backends and letting third-party developers write their -own is a pretty good approach too.

-

Unfortunately, MySQL's storage backend interface puts a very low ceiling on -the ways storage backends can make MySQL behave better.

-

MySQL's data access paths through table engines are very simple: MySQL asks -the engine to open a table, asks the engine to iterate through the table -returning rows, filters the rows itself (outside of the storage engine), then -asks the engine to close the table. Alternately, MySQL asks the engine to open -a table, asks the engine to retrieve rows in range or for a single value over -a specific index, filters the rows itself, and asks the engine to close the -table.

-

This simplistic interface frees table engines from having to worry about query -optimization - in theory. Unfortunately, engine-specific features have a large -impact on the performance of various query plans, but the channels back to the -query planner provide very little granularity for estimating cost and prevent -the planner from making good use of the engine in unusual cases. Conversely, -the table engine system is totally isolated from the actual query, and can't -make query-dependent performance choices “on its own.” There's no third path; -the query planner itself is not pluggable.

-

Similar consequences apply to type checking, support for new types, or even -something as “obvious” as multiple automatic TIMESTAMP columns in the same -table.

-

Table manipulation -- creation, structural modification, and so on -- runs -into similar problems. MySQL itself parses each CREATE TABLE statement, then -hands off a parsed representation to the table engine so that it can manage -storage. The parsed representation is lossy: there are plenty of forms MySQL's -parser recognizes that aren't representable in a TABLE structure, preventing -engines from implementing, say, column or tuple CHECK constraints without -MySQL's help.

-

The sheer number of table -engines makes -that help very slow in coming. Any change to the table engine interface means -perturbing the code to each engine, making progress on new MySQL-level -features that interact with storage such as better query planning or new SQL -constructs necessarily slow to implement and slow to test.

-

Held Back By History

-

The original MySQL team focused on pure read performance and on “ease of use” -(for new users with simple needs, as far as I can tell) over correctness and -completeness, violating Knuth's laws of optimization. Many of these decisions -locked MySQL into behaviours very early in its life that it still displays -now. Features like implicit type conversions legitimately do help streamline -development in very simple cases; experience with other -languages -unfortunately shows that the same behaviours sandbag development and help hide -bugs in more sophisticated scenarios.

-

MySQL has since changed hands, and the teams working on MySQL (and MariaDB, -and Percona) are much more mature now than the team that made those early -decisions. MySQL's massive and frequently non-savvy userbase makes it very hard -to introduce breaking changes. At the same time, adding optional breaking -changes via server and client mode flags (such as sql_mode) increases the -cognitive overhead of understanding MySQL's behaviours -- especially when that -behaviour can vary from client to client, or when the server's configuration is -out of the user's control (for example, on a shared host, or on EC2).

-

A solution similar to Python's from __future__ import pragmas for making -breaking changes opt-in some releases in advance of making them mandatory -might help, but MySQL doesn't have the kind of highly-invested, highly-skilled -user base that would make that effective -- and it still has all of the -problems of modal behaviour.

-

Bad Arguments

-

Inevitably, someone's going to come along and tell me how wrong I am and how -MySQL is just fine as a database system. These people are everywhere, and they -mean well too, and they are almost all wrong. There are two good reasons to -use MySQL:

-
    -
  1. Some earlier group wrote for it, and we haven't finished porting our code - off of MySQL.
  2. -
  3. We've considered all of these points, and many more, and decided that - ___feature_x___ that MySQL offers is worth the hassle.
  4. -
-

Unfortunately, these aren't the reasons people do give, generally. The -following are much more common:

-
    -
  • It's good enough. No it ain't. There are plenty of other equally-capable - data storage systems that don't come with MySQL's huge raft of edge cases - and quirks.
      -
    • We haven't run into these problems. Actually, a lot of these - problems happen silently. Odds are, unless you write your queries and - schema statements with the manual open and refer back to it constantly, - or have been using MySQL since the 3.x era daily, at least some of - these issues have bitten you. The ones that prevent you from using your - database intelligently are very hard to notice in action.
    • -
    -
  • -
  • We already know how to use it. MySQL development and administration - causes brain damage, folks, the same way PHP does. Where PHP teaches - programmers that “array” is the only structure you need, MySQL teaches - people that databases are awkward, slow, hard-to-tune monsters that require - constant attention. That doesn't have to be true; there are comfortable, - fast, and easily-tuned systems out there that don't require daily care and - feeding or the love of a specialist.
  • -
  • It's the only thing our host supports. Get a better host. It's - not like they're expensive or hard to find.
      -
    • We used it because it was there. Please hire some fucking software - developers and go back to writing elevator pitches and flirting with Y - Combinator.
    • -
    -
  • -
  • Everybody knows MySQL. It's easy to hire MySQL folks. It's easy to hire - MCSEs, too, but you should be hiring for attitude and ability to learn, not - for specific skillsets, if you want to run a successful software project.
      -
    • It's popular. Sure, and nobody ever got fired for buying - IBM/Microsoft/Adobe. Popularity isn't any indication of quality, and if - we let popularity dictate what technology we use and improve we'll never - get anywhere. Marketing software to geeks is easy - it's just that - lots of high-quality projects don't bother.
    • -
    -
  • -
  • It's lightweight. So's SQLite 3 or - H2. If you care about deployment - footprint more than any other factor, MySQL is actually pretty clunky (and - embedded MySQL has even bigger problems than freestanding MySQL).
  • -
  • It's getting better, so we might as well stay on it. It's - true, if you go - by feature checklists and the manual, MySQL is improving “rapidly.” 5.6 is - due out soon and superficially looks to contain a number of good changes. I - have two problems with this line of reasoning:
      -
    1. Why wait? Other databases are good now, not eventually.
    2. -
    3. MySQL has a history of providing the bare minimum to satisfy a feature -checkbox without actually making the feature work well, work consistently, -or work in combination with other features.
    4. -
    -
  • -
-
- - - -
-
- - -comments powered by Disqus -
- - - - - -
- - \ No newline at end of file -- cgit v1.2.3