From b4e1d897d127263507f21ece87b4e95d8103d56a Mon Sep 17 00:00:00 2001 From: Owen Jacobson Date: Thu, 3 Jan 2013 20:04:32 -0500 Subject: Imported "Merging Structural Changes" verbatim. --- wiki/dev/merging-structural-changes.md | 86 ++++++++++++++++++++++++++++++++++ 1 file changed, 86 insertions(+) create mode 100644 wiki/dev/merging-structural-changes.md (limited to 'wiki') diff --git a/wiki/dev/merging-structural-changes.md b/wiki/dev/merging-structural-changes.md new file mode 100644 index 0000000..46f8766 --- /dev/null +++ b/wiki/dev/merging-structural-changes.md @@ -0,0 +1,86 @@ +# Merging Structural Changes + +Recently, a project I'm working on set out to reinvent their build process, +migrating from a mass of poorly-written Ant scripts to Maven and reorganizing +their source tree in the process. The development process is based on having a +branch per client, so there is a lot of ongoing development on the original +layout for clients that haven't been migrated yet. We discovered that our +version control tool, [Subversion](http://subversion.tigris.org/), is unable +to merge the changes between client branches on the old structure and the +trunk on the new structure automatically. + +Curiousity piqued, I cooked up a script that reproduces the problem and +performs the merge from various directions to examine the results. Subversion, +sadly, performed dismally: none of the merge scenarios tested retained content +changes when merging structural changes to the same files. + +## The Preferred Outcome + +![Both changes survive the +merge.](/media/dev/merging-structural-changes/ideal-merge-results) + +The diagram above shows a very simple source tree with one directory, `dir-a`, +containing one file with two lines in it. On one branch, the file is modified +to have a third line; on another branch, the directory is renamed to `dir-b`. +Then, both branches are merged, and the resulting tree contains both sets of +changes: the file has three lines, and the directory has a new name. + +This is the preferred outcome, as no changes are lost or require manual +merging. + +## Subversion + +![Subversion loses the content +change.](/media/dev/merging-structural-changes/subversion-merge-results) + +There are two merge scenarios in this diagram, with almost the same outcome. +On the left, a working copy of the branch where the file's content changed is +checked out, then the changes from the branch where the structure changed are +merged in. On the right, a working copy of the branch where the structure +changed is checked out, then the changes from the branch where the content +changed are merged in. In both cases, the result of the merge has the new +directory name, and the original file contents. In one case, the merge +triggers a rather opaque warning about a "missing file"; in the other, the +merge silently ignores the content changes. + +This is a consequence of the way Subversion implements renames and copies. +When Subversion assembles a changeset for committing to the repository, it +comes up with a list of primitive operations that reproduce the change. There +is no primitive that says "this object was moved," only primitives which say +"this object was deleted" or "this object was added, as a copy of that +object." When you move a file in Subversion, those two operations are +scheduled. Later, when Subversion goes to merge content changes to the +original file, all it sees is that the file has been deleted; it's completely +unaware that there is a new name for the same file. + +This would be fairly easy to remedy by adding a "this object was moved to that +object" primitive to the changeset language, and [a bug report for just such a +feature](http://subversion.tigris.org/issues/show_bug.cgi?id=898) was filed in +2002. However, by that time Subversion's repository and changeset formats had +essentially frozen, as Subversion was approaching a 1.0 release and more +important bugs _without_ workarounds were a priority. + +There is some work going on in Subversion 1.6 to handle tree conflicts (the +kind of conflicts that come from this kind of structural change) more +sensibly, which will cause the two merges above to generate a Conflict result, +which is not as good as automatically merging it but far better than silently +ignoring changes. + +## Mercurial + +![Mercurial preserves the content +change.](/media/dev/merging-structural-changes/mercurial-merge-results) + +Interestingly, there are tools which get this merge scenario right: the +diagram above shows how [Mercurial](http://www.selenic.com/mercurial/) handles +the same two tests. Since its changeset language does include an "object +moved" primitive, it's able to take a content change for `dir-a/file` and +apply it to `dir-b/file` if appropriate. + +## Further Resources + +If you feel like reproducing this yourself, or want to adapt my test scripts +to work with your favourite version control system, the scripts are +[available](http://alchemy.grimoire.ca/hg/tree-conflicts) from Mercurial or as +a [zip](http://alchemy.grimoire.ca/hg/tree-conflicts/archive/tip.zip). Patches +and suggestions welcome. -- cgit v1.2.3 From 074759948e4a015d90278966b50e2b216abfc4f0 Mon Sep 17 00:00:00 2001 From: Owen Jacobson Date: Thu, 3 Jan 2013 20:06:18 -0500 Subject: "Recently" in 2008. It's 2013 now. --- wiki/dev/merging-structural-changes.md | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-) (limited to 'wiki') diff --git a/wiki/dev/merging-structural-changes.md b/wiki/dev/merging-structural-changes.md index 46f8766..e1a7c1e 100644 --- a/wiki/dev/merging-structural-changes.md +++ b/wiki/dev/merging-structural-changes.md @@ -1,11 +1,11 @@ # Merging Structural Changes -Recently, a project I'm working on set out to reinvent their build process, +In 2008, a project I was working on set out to reinvent their build process, migrating from a mass of poorly-written Ant scripts to Maven and reorganizing -their source tree in the process. The development process is based on having a -branch per client, so there is a lot of ongoing development on the original -layout for clients that haven't been migrated yet. We discovered that our -version control tool, [Subversion](http://subversion.tigris.org/), is unable +their source tree in the process. The development process was based on having +a branch per client, so there was a lot of ongoing development on the original +layout for clients that hadn't been migrated yet. We discovered that our +version control tool, [Subversion](http://subversion.tigris.org/), was unable to merge the changes between client branches on the old structure and the trunk on the new structure automatically. -- cgit v1.2.3 From 0fb4cb644d9408e44f4c5ab30a60700b0418f120 Mon Sep 17 00:00:00 2001 From: Owen Jacobson Date: Thu, 3 Jan 2013 20:06:33 -0500 Subject: Those scripts have long since vanished. --- wiki/dev/merging-structural-changes.md | 8 -------- 1 file changed, 8 deletions(-) (limited to 'wiki') diff --git a/wiki/dev/merging-structural-changes.md b/wiki/dev/merging-structural-changes.md index e1a7c1e..0c3970f 100644 --- a/wiki/dev/merging-structural-changes.md +++ b/wiki/dev/merging-structural-changes.md @@ -76,11 +76,3 @@ diagram above shows how [Mercurial](http://www.selenic.com/mercurial/) handles the same two tests. Since its changeset language does include an "object moved" primitive, it's able to take a content change for `dir-a/file` and apply it to `dir-b/file` if appropriate. - -## Further Resources - -If you feel like reproducing this yourself, or want to adapt my test scripts -to work with your favourite version control system, the scripts are -[available](http://alchemy.grimoire.ca/hg/tree-conflicts) from Mercurial or as -a [zip](http://alchemy.grimoire.ca/hg/tree-conflicts/archive/tip.zip). Patches -and suggestions welcome. -- cgit v1.2.3 From 93fea46af6e89b6e0dc32e8598e79929d80973cd Mon Sep 17 00:00:00 2001 From: Owen Jacobson Date: Thu, 3 Jan 2013 20:08:24 -0500 Subject: Git can merge structural changes, kind of --- wiki/dev/merging-structural-changes.md | 7 +++++++ 1 file changed, 7 insertions(+) (limited to 'wiki') diff --git a/wiki/dev/merging-structural-changes.md b/wiki/dev/merging-structural-changes.md index 0c3970f..f597d39 100644 --- a/wiki/dev/merging-structural-changes.md +++ b/wiki/dev/merging-structural-changes.md @@ -76,3 +76,10 @@ diagram above shows how [Mercurial](http://www.selenic.com/mercurial/) handles the same two tests. Since its changeset language does include an "object moved" primitive, it's able to take a content change for `dir-a/file` and apply it to `dir-b/file` if appropriate. + +## Git + +Git also gets this scenario right, _usually_. Unlike Mercurial, Git does not +track file copies or renames in its commits at all, prefering to infer them by +content comparison every time it performs a move-aware operation, such as a +merge. -- cgit v1.2.3 From 1b8e018ba9a2562e4215598e75fc91f63582db71 Mon Sep 17 00:00:00 2001 From: Owen Jacobson Date: Thu, 3 Jan 2013 21:01:04 -0500 Subject: Imported draft of single-responsibility article --- wiki/dev/rich-shared-models.md | 102 +++++++++++++++++++++++++++++++++++++++++ 1 file changed, 102 insertions(+) create mode 100644 wiki/dev/rich-shared-models.md (limited to 'wiki') diff --git a/wiki/dev/rich-shared-models.md b/wiki/dev/rich-shared-models.md new file mode 100644 index 0000000..7309dbe --- /dev/null +++ b/wiki/dev/rich-shared-models.md @@ -0,0 +1,102 @@ +# Rich Shared Models Must Die + +In a gaming system I once worked on, there was a single class which was +responsible for remembering everything about a user: their name and contact +information, their wagers, their balance, and every other fact about a user +the system cared about. In a system I'm working with now, there's a set of +classes that collaborate to track everything about the domain: prices, +descriptions, custom search properties, and so on. + +Both of these are examples of shared, system-wide models. + +Shared models are evil. + +Shared models _must be destroyed_. + +A software system's model is the set of functions and data types it uses to +decide what to do in response to various events. Models embody the development +team's assumptions and knowledge about the problem space, and usually reflect +the structure of the applications that use them. Not all systems have explicit +models, and it's often hard to draw a line through the code base separating +the code that is the model from the code that is not as every programmer sees +models slightly differently. + +With the rise of object-oriented development, explicit models became the focus +of several well-known practices. Many medium-to-large projects are built +"model first", with the interfaces to that model being sketched out later in +the process. Since the model holds the system's understanding of its task, +this makes sense, and so long as you keep the problem you're actually solving +in mind, it works well. Unfortunately, it's too easy to lose sight of the +problem and push the model as the whole reason for the system around it. This, +in combination with both emotional and technical investment in any existing +system, strongly encourages building `new` systems around the existing +model pieces even if the relationship between the new system is tenuous at +best. + +* Why do we share them? + * Unmanaged growth + * Adding features to an existing system + * Building new systems on top of existing tools + * Misguided applications of "simplicity" and "reuse" + * Encouraged by distributed object systems (CORBA, EJB, SOAP, COM) +* What are the consequences? + * Models end up holding behaviour and data relevant to many applications + * Every application using the model has to make the same assumptions + * Changing the model usually requires upgrading everyone at the same time + * Changes to the model are risky and impact many applications, even if the + changes are only relevant to one application +* What should we do instead? + * Narrow, flat interfaces + * Each system is responsible for its own modelling needs + * Systems share data and protocols, not objects + * Libraries are good, if the entire world doesn't need to upgrade at the + same time + +It's easy to start building a system by figuring out what the various nouns it +cares about are. In the gambling example, one of our nouns was a user (the guy +sitting at a web browser somewhere), who would be able to log in, deposit +money, place a wager, and would have to be notified when the wager was +settled. This is a clear, reasonable entity for describing the goal of placing +bets online, which we could make reasonable assumptions about. It's also a +terrible thing to turn into a class. + +The User class in our gambling system was responsible for all of those things; +as a result, every part of the system ended up using a User object somewhere. +Because the User class had many responsibilities, it was subject to frequent +changes; because it was used everywhere, those changes had the capability to +break nearly any part of the overall system. Worse, because so much +functionality was already in one place, it became psychologically easy to add +one more responsibility to its already-bloated interface. + +What had been a clean model in the problem space eventually became one of a +handful of "glue" pieces in a [big ball of +mud](http://www.laputan.org/mud/mud.html#BigBallOfMud) program. The User +object did not come about through conscious design, but rather through +evolution from a simple system. There was no clear point where User became +"too big"; instead, the vagueness of its role slowly grew until it became the +default behaviour-holder for all things user-specific. + +The same problem modeling exercise also points at a better way to design the +same system: it describes a number of capabilities the system needed to be +able to perform, each of which is simpler than "build a gaming website." Each +of these capabilities (accept or reject logins, process deposits, accept and +settle wagers, and send out notification emails to players) has a much simpler +model and solves a much more constrained of problem. There is no reason the +authentication service needs to share any data except an identity with the +wagering service: one cares about login names, passwords, and authorization +tickets while the other cares about accounting, wins and losses, and posted +odds. + +There is a small set of key facts that can be used to correlate all of pieces: +usernames, which uniquely identify a user, can be used to associate data and +behaviour in the login domain with data and behaviour in the accounting and +wagering domain, and with information in a contact management domain. All of +these key facts are flat—they have very little structure and no behaviour, and +can be passed from service to service without dragging along an entire +application's worth of baggage data. + +Sharing model classes between many services creates a huge maintenance +bottleneck. Isolating models within the services they support helps encourage +clean separations between services, which in turn makes it much easier to +understand individual services and much easier to maintain the system as a +whole. Kindergarten lied: sharing is _wrong_. -- cgit v1.2.3 From 3cbde77ef52dfbfe35538cea5e0213c931db459d Mon Sep 17 00:00:00 2001 From: Owen Jacobson Date: Thu, 3 Jan 2013 21:24:46 -0500 Subject: Imported draft about liquibase --- wiki/dev/liquibase.md | 77 +++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 77 insertions(+) create mode 100644 wiki/dev/liquibase.md (limited to 'wiki') diff --git a/wiki/dev/liquibase.md b/wiki/dev/liquibase.md new file mode 100644 index 0000000..6e5e97d --- /dev/null +++ b/wiki/dev/liquibase.md @@ -0,0 +1,77 @@ +# Liquibase + +Note to self: I think this (a) needs an outline and (b) wants to become a "how +to automate db upgrades for dummies" page. Also, this is really old (~2008) +and many things have changed: database migration tools are more +widely-available and mature now. On the other hand, I still see a lot of +questions on IRC that are based on not even knowing these tools exist. + +----- + +Successful software projects are characterized by extensive automation and +supporting tools. For source code, we have version control tools that support +tracking and reviewing changes, marking particular states for release, and +automating builds. For databases, the situation is rather less advanced in a +lot of places: outside of Rails, which has some rather nice +[migration](http://wiki.rubyonrails.org/rails/pages/understandingmigrations) +support, and [evolutions](http://code.google.com/p/django-evolution/) or +[South](http://south.aeracode.org) for Django, there are few tools that +actually track changes to the database or to the model in a reproducible way. + +While I was exploring the problem by writing some scripts for my own projects, +I came to a few conclusions. You need to keep a receipt for the changes a +database has been exposed to in the database itself so that the database can +be reproduced later. You only need scripts to go forward from older versions +to newer versions. Finally, you need to view DDL statements as a degenerate +form of diff, between two database states, that's not combinable the way +textual diff is except by concatenation. + +Someone on IRC mentioned [Liquibase](http://www.liquibase.org/) and +[migrate4j](http://migrate4j.sourceforge.net/) to me. Since I was already in +the middle of writing a second version of my own scripts to handle the issues +I found writing the first version, I stopped and compared notes. + +Liquibase is essentially the tool I was trying to write, only with two years +of relatively talented developer time poured into it rather than six weeks. + +Liquibase operates off of a version table it maintains in the database itself, +which tracks what changes have been applied to the database, and off of a +configuration file listing all of the database changes. Applying new changes +to a database is straightforward: by default, it goes through the file and +applies all the changes that are in the file that are not already in the +database, in order. This ensures that incremental changes during development +are reproduced in exactly the same way during deployment, something lots of +model-to-database migration tools have a problem with. + +The developers designed the configuraton file around some of the ideas from +[Refactoring +Databases](http://www.amazon.com/Refactoring-Databases-Evolutionary-Addison-Wesley-Signature/dp/0321293533), +and provided an [extensive list of canned +changes](http://www.liquibase.org/manual/home#available_database_refactorings) +as primitives in the database change scripts. However, it's also possible to +insert raw SQL commands (either DDL, or DML queries like `SELECT`s and +`INSERT`s) at any point in the change sequence if some change to the database +can't be accomplished with its set of refactorings. For truly hairy databases, +you can use either a Java class implementing your change logic or a shell +script alongside the configuration file. + +The tools for applying database changes to databases are similarly flexible: +out of the box, liquibase can be embedded in a fairly wide range of Java +applications using servlet context listeners, a Spring adapter, or a Grails +adapter; it can also be run from an ant or maven build, or as a standalone +tool. + +My biggest complaint is that liquibase is heavily Java-centric; while the +developers are planning .Net support, it'd be nice to use it for Python apps +as well. Triggering liquibase upgrades from anything other than a Java program +involves either shelling out to the `java` command or creating a JVM and +writing native glue to control the upgrade process, which are both pretty +painful. I'm also less than impressed with the javadoc documentation; while +the manual is excellent, the javadocs are fairly incomplete, making it hard to +write customized integrations. + +The liquibase developers deserve a lot of credit for solving a hard problem +very cleanly. + +*[DDL]: Data Definition Language +*[DML]: Data Manipulation Language \ No newline at end of file -- cgit v1.2.3 From 3ea9f16c84395a7c7a0164a581adaca639590859 Mon Sep 17 00:00:00 2001 From: Owen Jacobson Date: Thu, 3 Jan 2013 21:09:11 -0500 Subject: Imported notes about branches/twigs --- wiki/dev/twigs.md | 24 ++++++++++++++++++++++++ 1 file changed, 24 insertions(+) create mode 100644 wiki/dev/twigs.md (limited to 'wiki') diff --git a/wiki/dev/twigs.md b/wiki/dev/twigs.md new file mode 100644 index 0000000..ebc875c --- /dev/null +++ b/wiki/dev/twigs.md @@ -0,0 +1,24 @@ +# Branches and Twigs + +## Twigs + +* Relatively short-lived +* Share the commit policy of their parent branch +* Gain little value from global names +* Examples: most "topic branches" are twigs + +## Branches + +* Relatively long-lived +* Correspond to differences in commit policy +* Gain lots of value from global names +* Examples: git-flow 'master', 'develop', &c; hg 'stable' vs 'default'; + release branches + +## Commit policy + +* Decisions like "should every commit pass tests?" and "is rewriting or + deleting a commit acceptable?" are, collectively, the policy of a branch +* Can be very formal or even tool-enforced, or ad-hoc and fluid +* Shared understanding of commit policy helps get everyone's expectations + lined up, easing other SCM-mediated conversations -- cgit v1.2.3