summaryrefslogtreecommitdiff
path: root/wiki/git/stop-using-git-pull-to-deploy.md
blob: 078c95ba9425b489d5ded807826645a60368e19a (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
# Stop using `git pull` for deployment!

## The problem

* You have a Git repository containing your project.
* You want to “deploy” that code when it changes.
* You'd rather not download the entire project from scratch for each
  deployment.

## The antipattern

“I know, I'll use `git pull` in my deployment script!”

Stop doing this. Stop teaching other people to do this. It's wrong, and it
will eventually lead to deploying something you didn't want.

Deployment should be based on predictable, known versions of your code.
Ideally, every deployable version has a tag (and you deploy exactly that tag),
but even less formal processes, where you deploy a branch tip, should still be
deploying exactly the code designated for release. `git pull`, however, can
introduce new commits.

`git pull` is a two-step process:

1. Fetch the current branch's designated upstream remote, to obtain all of the
   remote's new commits.
2. Merge the current branch's designated upstream branch into the current
   branch.

The merge commit means the actual deployed tree might _not_ be identical to
the intended deployment tree. Local changes (intentional or otherwise) will be
preserved (and merged) into the deployment, for example; once this happens,
the actual deployed commit will _never_ match the intended commit.

`git pull` will approximate the right thing “by accident”: if the current
local branch (generally `master`) for people using `git pull` is always clean,
and always tracks the desired deployment branch, then `git pull` will update
to the intended commit exactly. This is pretty fragile, though; many git
commands can cause the local branch to diverge from its upstream branch, and
once that happens, `git pull` will always create new commits. You can patch
around the fragility a bit using the `--ff-only` option, but that only tells
you when your deployment environment has diverged and doesn't fix it.

## The right pattern

Quoting [Sitaram Chamarty](http://gitolite.com/the-list-and-irc/deploy.html):

> Here's what we expect from a deployment tool. Note the rule numbers --
> we'll be referring to some of them simply by number later.
>
> 1. All files in the branch being deployed should be copied to the
>     deployment directory.
>
> 2. Files that were deleted in the git repo since the last deployment
>     should get deleted from the deployment directory.
>
> 3. Any changes to tracked files in the deployment directory after the
>     last deployment should be ignored when following rules 1 and 2.
>
>     However, sometimes you might want to detect such changes and abort if
>     you found any.
>
> 4. Untracked files in the deploy directory should be left alone.
>
>     Again, some people might want to detect this and abort the deployment.

Sitaram's own documentation talks about how to accomplish these when
“deploying” straight out of a bare repository. That's unwise (not to mention
impractical) in most cases; deployment should use a dedicated clone of the
canonical repository.

I also disagree with point 3, preferring to keep deployment-related changes
outside of tracked files. This makes it much easier to argue that the changes
introduced to configure the project for deployment do not introduce new bugs
or other surprise features.

My deployment process, given a dedicated clone at `$DEPLOY_TREE`, is as
follows:

    cd "${DEPLOY_TREE}"
    git fetch --all
    git checkout --force "${TARGET}"
    # Following two lines only required if you use submodules
    git submodule sync
    git submodule update --init --recursive
    # Follow with actual deployment steps (run fabric/capistrano/make/etc)

`$TARGET` is either a tag name (`v1.2.1`) or a remote branch name
(`origin/master`), but could also be a commit hash or anything else Git
recognizes as a revision. This will detach the head of the `$DEPLOY_TREE`
repository, which is fine as no new changes should be authored in this
repository (so the local branches are irrelevant). The warning Git emits when
`HEAD` becomes detached is unimportant in this case.

The tracked contents of `$DEPLOY_TREE` will end up identical to the desired
commit, discarding local changes. The pattern above is very similar to what
most continuous integration servers use when building from Git repositories,
for much the same reason.