summaryrefslogtreecommitdiff
path: root/wiki
diff options
context:
space:
mode:
authorOwen Jacobson <owen.jacobson@grimoire.ca>2013-06-06 14:17:10 -0400
committerOwen Jacobson <owen.jacobson@grimoire.ca>2013-06-06 14:17:10 -0400
commit4daae732c190c5fcb96619dda1151fcc272395db (patch)
treea26212cdd91fae0da5fa35065558613a42b22800 /wiki
parent0b43b9e3e64793f5a222a644ed5ab074d8fa1024 (diff)
Started some work on git theory+practice. Needs exercises.
Diffstat (limited to 'wiki')
-rw-r--r--wiki/git/theory-and-practice/index.md41
-rw-r--r--wiki/git/theory-and-practice/objects.md125
-rw-r--r--wiki/git/theory-and-practice/refs-and-names.md88
3 files changed, 254 insertions, 0 deletions
diff --git a/wiki/git/theory-and-practice/index.md b/wiki/git/theory-and-practice/index.md
new file mode 100644
index 0000000..f1e8311
--- /dev/null
+++ b/wiki/git/theory-and-practice/index.md
@@ -0,0 +1,41 @@
+# Git Internals 101
+
+Yeah, yeah, another article about "how Git works". There are tons of these
+already. Personally, I'm fond of Sitaram Chamarty's [fantastic series of
+articles](http://gitolite.com/master-toc.html) explaining Git from both ends,
+and of [Git for Computer
+Scientists](http://eagain.net/articles/git-for-computer-scientists/). Maybe
+you'd rather read those.
+
+This page was inspired by very specific, recurring issues I've run into while
+helping people use Git. I think Git's "porcelain" layer -- its user interface
+-- is terrible, and does a bad job of insulating non-expert users from Git's
+internals. While I'd love to fix that (and I do contribute to discussions on
+that front, too), we still have the `git(1)` UI right now and people still get
+into trouble with it right now.
+
+Git follows the New Jersey approach laid out in Richard Gabriel's [The Rise of
+"Worse is Better"]: given the choice between a simple implementation and a
+simple interface, Git chooses the simple implementation almost everywhere.
+This internal simplicity can give users the leverage to fix the problems that
+its horrible user interface leads them into, so these pages will focus on
+explaining the simple parts and giving users the tools to examine them.
+
+Throughout these articles, I've written "Git does X" a lot. Git is
+_incredibly_ configurable; read that as "Git does X _by default_". I'll try to
+call out relevant configuration options as I go, where it doesn't interrupt
+the flow of knowledge.
+
+* [Objects](objects)
+* [Refs and Names](refs-and-names)
+
+By the way, if you think you're just going to follow the
+[many](http://git-scm.com/documentation)
+[excellent](http://www.atlassian.com/git/tutorial)
+[git](http://try.github.io/levels/1/challenges/1)
+[tutorials](https://www.kernel.org/pub/software/scm/git/docs/gittutorial.html)
+out there and that you won't need this knowledge, well, you will. You can
+either learn it during a quiet time, when you can think and experiment, or you
+can learn it when something's gone wrong, and everyone's shouting at each
+other. Git's high-level interface doesn't do much to keep you on the sensible
+path, and you will eventually need to fix something. \ No newline at end of file
diff --git a/wiki/git/theory-and-practice/objects.md b/wiki/git/theory-and-practice/objects.md
new file mode 100644
index 0000000..985e5dd
--- /dev/null
+++ b/wiki/git/theory-and-practice/objects.md
@@ -0,0 +1,125 @@
+# Objects
+
+Git's basest level is a storage and naming system for things Git calls
+"objects". These objects hold the bulk of the data about files and projects
+tracked by Git: file contents, directory trees, commits, and so on. Every
+object is identified by a SHA-1 hash, which is derived from its contents.
+
+SHA-1 hashes are obnoxiously long, so Git allows you to substitue any unique
+prefix of a SHA-1 hash, so long as it's at least four characters long. If the
+hash `0b43b9e3e64793f5a222a644ed5ab074d8fa1024` is present in your repository,
+then Git commands will understand `0b43`, `0b43b9`, and other patterns to all
+refer to the same object, so long as no other object has the same SHA-1
+prefix.
+
+## Blobs
+
+The contents of every file that's ever been stored in a Git repository are
+stored as `blob` objects. These objects are very simple: they contain the file
+contents, byte for byte.
+
+## Trees
+
+File contents (and trees, and Other Things we'll get to later) are tied
+together into a directory structure by `tree` objects. These objects contain a
+list of records, with one child per record. Each record contains a permissions
+field corresponding to the POSIX permissions mask of the object, a type, a
+SHA-1 for another object, and a name.
+
+A directory containing only files might be represented as the tree
+
+ 100644 blob 511542ad6c97b28d720c697f7535897195de3318 config.md
+ 100644 blob 801ddd5ae10d6282bbf36ccefdd0b052972aa8e2 integrate.md
+ 100644 blob 61d28155862607c3d5d049e18c5a6903dba1f85e scratch.md
+ 100644 blob d7a79c144c22775239600b332bfa120775bab341 survival.md
+
+while a directory with subdirectories would also have some `tree` children:
+
+ 040000 tree f57ef2457a551b193779e21a50fb380880574f43 12factor
+ 040000 tree 844697ce99e1ef962657ce7132460ad7a38b7584 authnz
+ 100644 blob 54795f9b774547d554f5068985bbc6df7b128832 cool-urls-can-change.md
+ 040000 tree fc3f39eb5d1a655374385870b8be56b202be7dd8 dev
+ 040000 tree 22cbfb2c1d7b07432ea7706c36b0d6295563c69c devops
+ 040000 tree 0b3e63b4f32c0c3acfbcf6ba28d54af4c2f0d594 git
+ 040000 tree 5914fdcbd34e00e23e52ba8e8bdeba0902941d3f java
+ 040000 tree 346f71a637a4f8933dc754fef02515a8809369c4 mysql
+ 100644 blob b70520badbb8de6a74b84788a7fefe64a432c56d packaging-ideas.md
+ 040000 tree 73ed6572345a368d20271ec5a3ffc2464ac8d270 people
+
+## Commits
+
+Blobs and trees are sufficient to store arbitrary directory trees in Git, and
+you could use them that way, but Git is mostly used as a revision-tracking
+system. Revisions and their history are represented by `commit` objects, which contain:
+
+ * The SHA-1 hash of the root `tree` object of the commit,
+ * Zero or more SHA-1 hashes for parent commits,
+ * The name and email address of the commit's "author",
+ * The name and email address of the commit's "committer",
+ * Timestamps representing when the commit was authored and committed, and
+ * A commit message.
+
+Commit objects' parent references form a directed acyclic graph; the subgraph
+reachable from a specific commit is that commit's _history_.
+
+When working with Git's user interface, commit parents are given in a
+predictable order determined by the `git checkout` and `git merge` commands.
+
+## Tags
+
+Git's revision-tracking system supports "tags", which are stable names for
+specific configurations. It also, uniquely, supports a concept called an
+"annotated tag", represented by the `tag` object type. These annotated tag
+objects contain
+
+ * The type and SHA-1 hash of another object,
+ * The name and email address of the person who created the tag,
+ * A timestamp representing the moment the tag was created, and
+ * A tag message.
+
+## Anonymity
+
+There's a general theme to Git's object types: no object knows its own name.
+Every object only has a name in the context of some containing object, or in
+the context of [Git's refs mechanism](refs-and-names), which I'll get to
+shortly. This means that the same `blob` object can be reused for multiple
+files (or, more probably, the same file in multiple commits), if they happen
+to have the same contents.
+
+This also applies to tag objects, even though their role is part of a system
+for providing stable, meaningful names for commits.
+
+## Examining objects
+
+* `git cat-file <type> <sha1>`: decodes the object `<sha1>` and prints its
+ contents to stdout. This prints the object's contents in their raw form,
+ which is less than useful for `tree` objects.
+
+* `git cat-file -p <sha1>`: decodes the object `<sha1>` and pretty-prints it.
+ This pretty-printing stays close to the underlying disk format; it's most
+ useful for decoding `tree` objects.
+
+* `git show <sha1>`: decodes the object `<sha1>` and formats its contents to
+ stdout. For blobs, this is identical to what `git cat-file blob` would do,
+ but for trees, commits, and tags, the output is reformated to be more
+ readable.
+
+## Storage
+
+Objects are stored in two places in Git: as "loose objects", and in "pack
+files". Newly-created objects are initially loose objects, for ease of
+manipulation; transferring objects to another repository or running certain
+administrative commands can cause them to be placed in pack files for faster
+transfer and for smaller storage.
+
+Loose objects are stored directly on the filesystem, in the Git repository's
+`objects` directory. Git takes a two-character prefix off of each object's
+SHA-1 hash, and uses that to pick a subdirectory of `objects` to store the
+object in. The remainder of the hash forms the filename. Loose objects are
+compressed with zlib, to conserve space, but the resulting directory tree can
+still be quite large.
+
+Packed objects are stored together in packed files, which live in the
+repository's `objects/pack` directory. These packed files are both compressed
+and delta-encoded, allowing groups of similar objects to be stored very
+compactly.
diff --git a/wiki/git/theory-and-practice/refs-and-names.md b/wiki/git/theory-and-practice/refs-and-names.md
new file mode 100644
index 0000000..94874c9
--- /dev/null
+++ b/wiki/git/theory-and-practice/refs-and-names.md
@@ -0,0 +1,88 @@
+# Refs and Names
+
+Git's [object system](objects) stores most of the data for projects tracked in
+Git, but only provides SHA-1 hashes. This is basically useless if you want to
+make practical use of Git, so Git also has a naming mechanism called "refs"
+that provide human-meaningful names for objects.
+
+There are two kinds of refs:
+
+* "Normal" refs, which are names that resolve directly to SHA-1 hashes. These
+ are the vast majority of refs in most repositories.
+
+* "Symbolic" refs, which are names that resolve to other refs. In most
+ repositories, only a few of these appear. (Circular references are possible
+ with symbolic refs. Git will refuse to resolve these.)
+
+Anywhere you could use a SHA-1, you can use a ref instead. Git interprets them
+identically, after resolving the ref down to the SHA-1.
+
+## Namespaces
+
+Every operation in Git that uses a name of some sort, including branching
+(branch names), tagging (tag names), fetching (remote-tracking branch names),
+and pushing (many kinds of name) expand those names to refs, using a namespace
+convention. The following namespaces are common:
+
+* `refs/heads/NAME`: branches. The branch name is the ref name with
+ `refs/heads/` removed. Names generally point to commits.
+
+* `refs/remotes/REMOTE/NAME`: "remote-tracking" branches. These are maintained
+ in tandem by `git remote` and `git fetch`, to cache the state of other
+ repositories. Names generally point to commits.
+
+* `refs/tags/NAME`: tags. The tag name is the ref name with `refs/heads/`
+ removed. Names generally point to commits or tag objects.
+
+* `refs/stash`: The most recent stash entry, as maintained by `git stash`.
+ (Other stash entries are maintained by a separate system.) Names generally
+ point to commits.
+
+Tools can invent new refs for their own purposes, or manipulate existing refs;
+the convention is that tools that use refs (which is, as I said, most of them)
+respect the state of the ref as if they'd created that state themselves,
+rather than sanity-checking the ref before using it.
+
+## Special refs
+
+There are a handful of special refs used by Git commands for their own
+operation. These refs do _not_ begin with `refs/`:
+
+* `HEAD`: the "current" commit for most operations. This is set when checking
+ out a commit, and many revision-related commands default to `HEAD` if not
+ given a revision to operate on. `HEAD` can either be a symbolic ref
+ (pointing to a branch ref) or a normal ref (pointing directly to a commit),
+ and is very frequently a symbolic ref.
+
+* `MERGE_HEAD`: during a merge, `MERGE_HEAD` resolves to the commit whose
+ history is being merged.
+
+* `ORIG_HEAD`: set by operations that change `HEAD` in potentially destructive
+ ways by resolving `HEAD` before making the change.
+
+* `CHERRY_PICK_HEAD` is set during `git cherry-pick` to the commit whose
+ changes are being copied.
+
+* `FETCH_HEAD` is set by the forms of `git fetch` that fetch a single ref, and
+ points to the commit the fetched ref pointed to.
+
+## Examining and manipulating refs
+
+The `git show-ref` command will list the refs in namespaces under `refs` in
+your repository, printing the SHA-1 hashes they resolve to. Pass `--head` to
+also include `HEAD`.
+
+The following commands can be used to manipulate refs directly:
+
+* `git update-ref <ref> <sha1>` forcibly sets `<ref>` to the passed `<sha1>`.
+
+* `git update-ref -d <ref>` deletes a ref.
+
+* `git symbolic-ref <ref>` prints the target of `<ref>`, if `<ref>` is a
+ symbolic ref. (It will fail with an error message for normal refs.)
+
+* `git symbolic-ref <ref> <target>` forcibly makes `<ref>` a symbolic ref
+ pointing to `<target>`.
+
+Additionally, you can see what ref a given name resolves to using `git
+rev-parse --symbolic-full-name <name>` or `git show-ref <name>`. \ No newline at end of file