Notes Towards Detached Signatures in Git
Git supports a limited form of object authentication: specific object categories in Git's internal model can have GPG signatures embedded in them, allowing the authorship of the objects to be verified using GPG's underlying trust model. Tag signatures can be used to verify the authenticity and integrity of the snapshot associated with a tag, and the authenticity of the tag itself, filling a niche broadly similar to code signing in binary distribution systems. Commit signatures can be used to verify the authenticity of the snapshot associated with the commit, and the authorship of the commit itself. (Conventionally, commit signatures are assumed to also authenticate either the entire line of history leading to a commit, or the diff between the commit and its first parent, or both.)
Git's existing system has some tradeoffs.
-
Signatures are embedded within the objects they sign. The signature is part of the object's identity; since Git is content-addressed, this means that an object can neither be retroactively signed nor retroactively stripped of its signature without modifying the object's identity. Git's distributed model means that these sorts of identity changes are both complicated and easily detected.
-
Commit signatures are second-class citizens. They're a relatively recent addition to the Git suite, and both the implementation and the social conventions around them continue to evolve.
-
Only some objects can be signed. While Git has relatively weak rules about workflow, the signature system assumes you're using one of Git's more widespread workflows by limiting your options to at most one signature, and by restricting signatures to tags and commits (leaving out blobs, trees, and refs).
I believe it would be useful from an authentication standpoint to add
"detached" signatures to Git, to allow users to make these tradeoffs
differently if desired. These signatures would be stored as separate (blob)
objects in a dedicated refs namespace, supporting retroactive signatures,
multiple signatures for a given object, "policy" signatures, and
authentication of arbitrary objects.
The following notes are partially guided by Git's one existing "detached
metadata" facility, git notes. Similarities are intentional; divergences
will be noted where appropriate. Detached signatures are meant to
interoperate with existing Git workflow as much as possible: in particular,
they can be fetched and pushed like any other bit of Git metadata.
A detached signature cryptographically binds three facts together into an assertion whose authenticity can be checked by anyone with access to the signatory's keys:
- An object (in the Git sense; a commit, tag, tree, or blob),
- A policy label, and
- A signatory (a person or agent making the assertion).
These assertions can be published separately from or in tandem with the objects they apply to.
Policies
Taking a hint from Monotone, every signature includes a "policy" identifying how the signature is meant to be interpreted. Policies are arbitrary strings; their meaning is entirely defined by tooling and convention, not by this draft.
This draft uses a single policy, author, for its examples. A signature
under the author policy implies that the signatory had a hand in the
authorship of the designated object. (This is compatible with existing
interpretations of signed tags and commits.) (Authorship under this model is
strictly self-attested: you can claim authorship of anything, and you cannot
assert anyone else's authorship.)
The Monotone documentation suggests a number of other useful policies related to testing and release status, automated build results, and numerous other factors. Use your imagination.
What's In A Signature
Detached signatures cover the disk representation of an object, as given by
git cat-file <TYPE> <SHA1>
For most of Git's object types, this means that the signed content is plain
text. For tree objects, the signed content is the awful binary
representation of the tree, not the pretty representation given by git
ls-tree or git show.
Detached signatures include the "policy" identifier in the signed content, to
prevent others from tampering with policy choices via refs hackery. (This
will make more sense momentarily.) The policy identifier is prepended to the
signed content, terminated by a zero byte (as with Git's own type
identifiers, but without a length field as length checks are performed by
signing and again when the signature is stored in Git).
To generate the complete signable version of an object, use something equivalent to the following shell snippet:
# generate-signable POLICY TYPE SHA1
function generate-signable() {
echo -n "$1"
SOMETHING OUTPUTTING A NUL HERE
git cat-file "$2" "$3"
}
(In the process of writing this, I discovered how hard it is to get Unix's C-derived shell tools to emit a zero byte.)
Signature Storage and Naming
We assume that a userid will sign an object at most once.
Each signature is stored in an independent blob object in the repository it
applies to. The signature object (described above) is stored in Git, and its
hash recorded in refs/signatures/<POLICY>/<SUBJECT SHA1>/<SIGNER KEY
FINGERPRINT>.
# sign POLICY TYPE SHA1 FINGERPRINT
function sign() {
local SIG_HASH=$(
generate-signable "$@" |
gpg --batch --no-tty --sign -u "$4" |
git hash-object --stdin -w -t blob
)
git update-ref "refs/signatures/$1/$3/$4"
}
Stored signatures always use the complete fingerprint to identify keys, to
minimize the risk of colliding key IDs while avoiding the need to store full
keys in the refs naming hierarchy.
The policy name can be reliably extracted from the ref, as the trailing part
has a fixed length (in both path segments and bytes) and each ref begins with
a fixed, constant prefix refs/signatures/.
Signature Verification
Given a signature ref as described above, we can verify and authenticate the signature and bind it to the associated object and policy by performing the following check:
- Pick apart the ref into policy, SHA1, and key fingerprint parts.
- Reconstruct the signed body as above, using the policy name extracted from the ref.
- Retrieve the signature from the ref and combine it with the object itself.
- Verify that the policy in the stored signature matches the policy in the ref.
-
Verify the signature with GPG:
# verify-gpg POLICY TYPE SHA1 FINGERPRINT verify-gpg() { { git cat-file "$2" "$3" git cat-file "refs/signatures/$1/$3/$4" } | gpg --batch --no-tty --verify } -
Verify the key fingerprint of the signing key matches the key fingerprint in the ref itself.
The specific rules for verifying the signature in GPG are left up to the user
to define; for example, some sites may want to auto-retrieve keys and use a
web of trust from some known roots to determine which keys are trusted, while
others may wish to maintain a specific, known keyring containing all signing
keys for each policy, and skip the web of trust entirely. This can be
accomplished via git-config, given some work, and via gpg.conf.
Distributing Signatures
Since each signature is stored in a separate ref, and since signatures are
not expected to be amended once published, the following refspec can be
used with git fetch and git push to distribute signatures:
refs/signatures/*:refs/signatures/*
Note the lack of a + decoration; we explicitly do not want to auto-replace
modified signatures, normally; explicit user action should be required.
Workflow Notes
There are two verification workflows for signatures: "static" verification, where the repository itself already contains all the refs and objects needed for signature verification, and "pre-receive" verification, where an object and its associated signature may be being uploaded at the same time.
It is impractical to verify signatures on the fly from an update hook.
Only pre-receive hooks can usefully accept or reject ref changes depending
on whether the push contains a signature for the pushed objects. (Git does
not provide a good mechanism for ensuring that signature objects are pushed
before their subjects.) Correctly verifying object signatures during
pre-receive regardless of ref order is far too complicated to summarize
here.
Attacks
Lies of Omission
It's trivial to hide signatures by deleting the signature refs. Similarly, anyone with access to a repository can delete any or all detached signatures from it without otherwise invalidating the signed objects.
Since signatures are mostly static, sites following the recommended no-force policy for signature publication should only be affected if relatively recent signatures are deleted. Older signatures should be available in one or more of the repository users' loca repositories; once created, a signature can be legitimately obtained from anywhere, not only from the original signatory.
The signature naming protocol is designed to resist most other forms of assertion tampering, but straight-up omission is hard to prevent.
Unwarranted Certification
The policy system allows any signatory to assert any policy. While
centralized signature distribution points such as "release" repositories can
make meaningful decisions about which signatures they choose to accept,
publish, and propagate, there's no way to determine after the fact whether a
policy assertion was obtained from a legitimate source or a malicious one
with no grounds for asserting the policy.
For example, I could, right now, sign an all-tests-pass policy assertion
for the Linux kernel. While there's no chance on Earth that the LKML team
would propagate that assertion, if I can convince you to fetch signatures
from my repository, you will fetch my bogus assertion. If all-tests-pass is
a meaningful policy assertion for the Linux kernel, then you will have very
few options besides believing that I assert that all tests have passed.
Ambigiuous Policy
This is an ongoing problem with crypto policy systems and user interfaces generally, but this design does nothing to ensure that policies are interpreted uniformly by all participants in a repository. In particular, there's no mechanism described for distributing either prose or programmatic policy definitions and checks. All policy information is out of band.
Git already has ambiguity problems around commit signing: there are multiple ways to interpret a signature on a commit:
-
I assert that this snapshot and commit message were authored as described in this commit's metadata. (In this interpretation, the signature's authenticity guarantees do not transitively apply to parents.)
-
I assert that this snapshot and commit message were authored as described in this commit's metadata, based on exactly the parent commits described. (In this interpretation, the signature's authenticity guarantees do transitively apply to parents. This is the interpretation favoured by XXX LINK HERE XXX.)
-
I assert that this diff and commit message was authored as described in this commit's metadata. (No assertions about the snapshot are made whatsoever, and assertions about parentage are barely sensical at all. This meshes with widespread, diff-oriented policies.)
Grafts and Replacements
Git permits post-hoc replacement of arbitrary objects via both the grafts
system (via an untracked, non-distributed file in .git, though some
repositories distribute graft lists for end-users to manually apply) and the
replacements system (via refs/replace/<SHA1>, which can optionally be
fetched or pushed). The interaction between these two systems and signature
verification needs to be very closely considered; I've not yet done so.
Cases of note:
- Neither signature nor subject replaced - the "normal" case
- Signature not replaced, subject replaced (by graft, by replacement, by both)
- Signature replaced, subject not replaced
- Both signature and subject replaced
It's tempting to outright disable git replace during signing and
verification, but this will have surprising effects when signing a ref-ish
instead of a bare hash. Since this is the normal case, I think this merits
more thought. (I'm also not aware of a way to disable grafts without
modifying .git, and having the two replacement mechanisms treated
differently may be dangerous.)
No Signed Refs
I mentioned early in this draft that Git's existing signing system doesn't support signing refs themselves; since refs are an important piece of Git's workflow ecosystem, this may be a major omission. Unfortunately, this proposal doesn't address that.
Possible Refinements
- Monotone's certificate system is key+value based, rather than label-based. This might be useful; while small pools of related values can be asserted using mutually exclusive policy labels (whose mutual exclusion is a matter of local interpretation), larger pools of related values rapidly become impractical under the proposed system.
For example, this proposal would be inappropriate for directly asserting third-party authorship; the asserted author would have to appear in the policy name itself, exposing the user to a potentially very large number of similar policy labels.
-
Ref signing via a manifest (a tree constellation whose paths are ref names and whose blobs sign the refs' values). Consider cribbing DNSSEC here for things like lightweight absence assertions, too.
-
Describe how this should interact with commit-duplicating and commit-rewriting workflows.