summaryrefslogtreecommitdiff
path: root/wiki
diff options
context:
space:
mode:
Diffstat (limited to 'wiki')
-rw-r--r--wiki/dev/liquibase.md77
-rw-r--r--wiki/dev/merging-structural-changes.md85
-rw-r--r--wiki/dev/rich-shared-models.md102
-rw-r--r--wiki/dev/twigs.md24
4 files changed, 288 insertions, 0 deletions
diff --git a/wiki/dev/liquibase.md b/wiki/dev/liquibase.md
new file mode 100644
index 0000000..6e5e97d
--- /dev/null
+++ b/wiki/dev/liquibase.md
@@ -0,0 +1,77 @@
+# Liquibase
+
+Note to self: I think this (a) needs an outline and (b) wants to become a "how
+to automate db upgrades for dummies" page. Also, this is really old (~2008)
+and many things have changed: database migration tools are more
+widely-available and mature now. On the other hand, I still see a lot of
+questions on IRC that are based on not even knowing these tools exist.
+
+-----
+
+Successful software projects are characterized by extensive automation and
+supporting tools. For source code, we have version control tools that support
+tracking and reviewing changes, marking particular states for release, and
+automating builds. For databases, the situation is rather less advanced in a
+lot of places: outside of Rails, which has some rather nice
+[migration](http://wiki.rubyonrails.org/rails/pages/understandingmigrations)
+support, and [evolutions](http://code.google.com/p/django-evolution/) or
+[South](http://south.aeracode.org) for Django, there are few tools that
+actually track changes to the database or to the model in a reproducible way.
+
+While I was exploring the problem by writing some scripts for my own projects,
+I came to a few conclusions. You need to keep a receipt for the changes a
+database has been exposed to in the database itself so that the database can
+be reproduced later. You only need scripts to go forward from older versions
+to newer versions. Finally, you need to view DDL statements as a degenerate
+form of diff, between two database states, that's not combinable the way
+textual diff is except by concatenation.
+
+Someone on IRC mentioned [Liquibase](http://www.liquibase.org/) and
+[migrate4j](http://migrate4j.sourceforge.net/) to me. Since I was already in
+the middle of writing a second version of my own scripts to handle the issues
+I found writing the first version, I stopped and compared notes.
+
+Liquibase is essentially the tool I was trying to write, only with two years
+of relatively talented developer time poured into it rather than six weeks.
+
+Liquibase operates off of a version table it maintains in the database itself,
+which tracks what changes have been applied to the database, and off of a
+configuration file listing all of the database changes. Applying new changes
+to a database is straightforward: by default, it goes through the file and
+applies all the changes that are in the file that are not already in the
+database, in order. This ensures that incremental changes during development
+are reproduced in exactly the same way during deployment, something lots of
+model-to-database migration tools have a problem with.
+
+The developers designed the configuraton file around some of the ideas from
+[Refactoring
+Databases](http://www.amazon.com/Refactoring-Databases-Evolutionary-Addison-Wesley-Signature/dp/0321293533),
+and provided an [extensive list of canned
+changes](http://www.liquibase.org/manual/home#available_database_refactorings)
+as primitives in the database change scripts. However, it's also possible to
+insert raw SQL commands (either DDL, or DML queries like `SELECT`s and
+`INSERT`s) at any point in the change sequence if some change to the database
+can't be accomplished with its set of refactorings. For truly hairy databases,
+you can use either a Java class implementing your change logic or a shell
+script alongside the configuration file.
+
+The tools for applying database changes to databases are similarly flexible:
+out of the box, liquibase can be embedded in a fairly wide range of Java
+applications using servlet context listeners, a Spring adapter, or a Grails
+adapter; it can also be run from an ant or maven build, or as a standalone
+tool.
+
+My biggest complaint is that liquibase is heavily Java-centric; while the
+developers are planning .Net support, it'd be nice to use it for Python apps
+as well. Triggering liquibase upgrades from anything other than a Java program
+involves either shelling out to the `java` command or creating a JVM and
+writing native glue to control the upgrade process, which are both pretty
+painful. I'm also less than impressed with the javadoc documentation; while
+the manual is excellent, the javadocs are fairly incomplete, making it hard to
+write customized integrations.
+
+The liquibase developers deserve a lot of credit for solving a hard problem
+very cleanly.
+
+*[DDL]: Data Definition Language
+*[DML]: Data Manipulation Language \ No newline at end of file
diff --git a/wiki/dev/merging-structural-changes.md b/wiki/dev/merging-structural-changes.md
new file mode 100644
index 0000000..f597d39
--- /dev/null
+++ b/wiki/dev/merging-structural-changes.md
@@ -0,0 +1,85 @@
+# Merging Structural Changes
+
+In 2008, a project I was working on set out to reinvent their build process,
+migrating from a mass of poorly-written Ant scripts to Maven and reorganizing
+their source tree in the process. The development process was based on having
+a branch per client, so there was a lot of ongoing development on the original
+layout for clients that hadn't been migrated yet. We discovered that our
+version control tool, [Subversion](http://subversion.tigris.org/), was unable
+to merge the changes between client branches on the old structure and the
+trunk on the new structure automatically.
+
+Curiousity piqued, I cooked up a script that reproduces the problem and
+performs the merge from various directions to examine the results. Subversion,
+sadly, performed dismally: none of the merge scenarios tested retained content
+changes when merging structural changes to the same files.
+
+## The Preferred Outcome
+
+![Both changes survive the
+merge.](/media/dev/merging-structural-changes/ideal-merge-results)
+
+The diagram above shows a very simple source tree with one directory, `dir-a`,
+containing one file with two lines in it. On one branch, the file is modified
+to have a third line; on another branch, the directory is renamed to `dir-b`.
+Then, both branches are merged, and the resulting tree contains both sets of
+changes: the file has three lines, and the directory has a new name.
+
+This is the preferred outcome, as no changes are lost or require manual
+merging.
+
+## Subversion
+
+![Subversion loses the content
+change.](/media/dev/merging-structural-changes/subversion-merge-results)
+
+There are two merge scenarios in this diagram, with almost the same outcome.
+On the left, a working copy of the branch where the file's content changed is
+checked out, then the changes from the branch where the structure changed are
+merged in. On the right, a working copy of the branch where the structure
+changed is checked out, then the changes from the branch where the content
+changed are merged in. In both cases, the result of the merge has the new
+directory name, and the original file contents. In one case, the merge
+triggers a rather opaque warning about a "missing file"; in the other, the
+merge silently ignores the content changes.
+
+This is a consequence of the way Subversion implements renames and copies.
+When Subversion assembles a changeset for committing to the repository, it
+comes up with a list of primitive operations that reproduce the change. There
+is no primitive that says "this object was moved," only primitives which say
+"this object was deleted" or "this object was added, as a copy of that
+object." When you move a file in Subversion, those two operations are
+scheduled. Later, when Subversion goes to merge content changes to the
+original file, all it sees is that the file has been deleted; it's completely
+unaware that there is a new name for the same file.
+
+This would be fairly easy to remedy by adding a "this object was moved to that
+object" primitive to the changeset language, and [a bug report for just such a
+feature](http://subversion.tigris.org/issues/show_bug.cgi?id=898) was filed in
+2002. However, by that time Subversion's repository and changeset formats had
+essentially frozen, as Subversion was approaching a 1.0 release and more
+important bugs _without_ workarounds were a priority.
+
+There is some work going on in Subversion 1.6 to handle tree conflicts (the
+kind of conflicts that come from this kind of structural change) more
+sensibly, which will cause the two merges above to generate a Conflict result,
+which is not as good as automatically merging it but far better than silently
+ignoring changes.
+
+## Mercurial
+
+![Mercurial preserves the content
+change.](/media/dev/merging-structural-changes/mercurial-merge-results)
+
+Interestingly, there are tools which get this merge scenario right: the
+diagram above shows how [Mercurial](http://www.selenic.com/mercurial/) handles
+the same two tests. Since its changeset language does include an "object
+moved" primitive, it's able to take a content change for `dir-a/file` and
+apply it to `dir-b/file` if appropriate.
+
+## Git
+
+Git also gets this scenario right, _usually_. Unlike Mercurial, Git does not
+track file copies or renames in its commits at all, prefering to infer them by
+content comparison every time it performs a move-aware operation, such as a
+merge.
diff --git a/wiki/dev/rich-shared-models.md b/wiki/dev/rich-shared-models.md
new file mode 100644
index 0000000..7309dbe
--- /dev/null
+++ b/wiki/dev/rich-shared-models.md
@@ -0,0 +1,102 @@
+# Rich Shared Models Must Die
+
+In a gaming system I once worked on, there was a single class which was
+responsible for remembering everything about a user: their name and contact
+information, their wagers, their balance, and every other fact about a user
+the system cared about. In a system I'm working with now, there's a set of
+classes that collaborate to track everything about the domain: prices,
+descriptions, custom search properties, and so on.
+
+Both of these are examples of shared, system-wide models.
+
+Shared models are evil.
+
+Shared models _must be destroyed_.
+
+A software system's model is the set of functions and data types it uses to
+decide what to do in response to various events. Models embody the development
+team's assumptions and knowledge about the problem space, and usually reflect
+the structure of the applications that use them. Not all systems have explicit
+models, and it's often hard to draw a line through the code base separating
+the code that is the model from the code that is not as every programmer sees
+models slightly differently.
+
+With the rise of object-oriented development, explicit models became the focus
+of several well-known practices. Many medium-to-large projects are built
+"model first", with the interfaces to that model being sketched out later in
+the process. Since the model holds the system's understanding of its task,
+this makes sense, and so long as you keep the problem you're actually solving
+in mind, it works well. Unfortunately, it's too easy to lose sight of the
+problem and push the model as the whole reason for the system around it. This,
+in combination with both emotional and technical investment in any existing
+system, strongly encourages building `new` systems around the existing
+model pieces even if the relationship between the new system is tenuous at
+best.
+
+* Why do we share them?
+ * Unmanaged growth
+ * Adding features to an existing system
+ * Building new systems on top of existing tools
+ * Misguided applications of "simplicity" and "reuse"
+ * Encouraged by distributed object systems (CORBA, EJB, SOAP, COM)
+* What are the consequences?
+ * Models end up holding behaviour and data relevant to many applications
+ * Every application using the model has to make the same assumptions
+ * Changing the model usually requires upgrading everyone at the same time
+ * Changes to the model are risky and impact many applications, even if the
+ changes are only relevant to one application
+* What should we do instead?
+ * Narrow, flat interfaces
+ * Each system is responsible for its own modelling needs
+ * Systems share data and protocols, not objects
+ * Libraries are good, if the entire world doesn't need to upgrade at the
+ same time
+
+It's easy to start building a system by figuring out what the various nouns it
+cares about are. In the gambling example, one of our nouns was a user (the guy
+sitting at a web browser somewhere), who would be able to log in, deposit
+money, place a wager, and would have to be notified when the wager was
+settled. This is a clear, reasonable entity for describing the goal of placing
+bets online, which we could make reasonable assumptions about. It's also a
+terrible thing to turn into a class.
+
+The User class in our gambling system was responsible for all of those things;
+as a result, every part of the system ended up using a User object somewhere.
+Because the User class had many responsibilities, it was subject to frequent
+changes; because it was used everywhere, those changes had the capability to
+break nearly any part of the overall system. Worse, because so much
+functionality was already in one place, it became psychologically easy to add
+one more responsibility to its already-bloated interface.
+
+What had been a clean model in the problem space eventually became one of a
+handful of "glue" pieces in a [big ball of
+mud](http://www.laputan.org/mud/mud.html#BigBallOfMud) program. The User
+object did not come about through conscious design, but rather through
+evolution from a simple system. There was no clear point where User became
+"too big"; instead, the vagueness of its role slowly grew until it became the
+default behaviour-holder for all things user-specific.
+
+The same problem modeling exercise also points at a better way to design the
+same system: it describes a number of capabilities the system needed to be
+able to perform, each of which is simpler than "build a gaming website." Each
+of these capabilities (accept or reject logins, process deposits, accept and
+settle wagers, and send out notification emails to players) has a much simpler
+model and solves a much more constrained of problem. There is no reason the
+authentication service needs to share any data except an identity with the
+wagering service: one cares about login names, passwords, and authorization
+tickets while the other cares about accounting, wins and losses, and posted
+odds.
+
+There is a small set of key facts that can be used to correlate all of pieces:
+usernames, which uniquely identify a user, can be used to associate data and
+behaviour in the login domain with data and behaviour in the accounting and
+wagering domain, and with information in a contact management domain. All of
+these key facts are flat—they have very little structure and no behaviour, and
+can be passed from service to service without dragging along an entire
+application's worth of baggage data.
+
+Sharing model classes between many services creates a huge maintenance
+bottleneck. Isolating models within the services they support helps encourage
+clean separations between services, which in turn makes it much easier to
+understand individual services and much easier to maintain the system as a
+whole. Kindergarten lied: sharing is _wrong_.
diff --git a/wiki/dev/twigs.md b/wiki/dev/twigs.md
new file mode 100644
index 0000000..ebc875c
--- /dev/null
+++ b/wiki/dev/twigs.md
@@ -0,0 +1,24 @@
+# Branches and Twigs
+
+## Twigs
+
+* Relatively short-lived
+* Share the commit policy of their parent branch
+* Gain little value from global names
+* Examples: most "topic branches" are twigs
+
+## Branches
+
+* Relatively long-lived
+* Correspond to differences in commit policy
+* Gain lots of value from global names
+* Examples: git-flow 'master', 'develop', &c; hg 'stable' vs 'default';
+ release branches
+
+## Commit policy
+
+* Decisions like "should every commit pass tests?" and "is rewriting or
+ deleting a commit acceptable?" are, collectively, the policy of a branch
+* Can be very formal or even tool-enforced, or ad-hoc and fluid
+* Shared understanding of commit policy helps get everyone's expectations
+ lined up, easing other SCM-mediated conversations