user-manual: rewrite object database discussion
Rewrite the introduction. Rewrite each section completely to make them work in the new order, to add some examples, and to move plumbing commands (like git-commit-tree) to the following chapter. Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
This commit is contained in:

committed by
J. Bruce Fields

parent
513d419c59
commit
1bbf1c7900
@ -2723,46 +2723,44 @@ database>> and the <<def_index,index>>.
|
|||||||
The Object Database
|
The Object Database
|
||||||
-------------------
|
-------------------
|
||||||
|
|
||||||
The object database is literally just a content-addressable collection
|
|
||||||
of objects. All objects are named by their content, which is
|
|
||||||
approximated by the SHA1 hash of the object itself. Objects may refer
|
|
||||||
to other objects (by referencing their SHA1 hash), and so you can
|
|
||||||
build up a hierarchy of objects.
|
|
||||||
|
|
||||||
All objects have a statically determined "type" which is
|
We already saw in <<understanding-commits>> that all commits are stored
|
||||||
determined at object creation time, and which identifies the format of
|
under a 40-digit "object name". In fact, all the information needed to
|
||||||
the object (i.e. how it is used, and how it can refer to other
|
represent the history of a project is stored in objects with such names.
|
||||||
objects). There are currently four different object types: "blob",
|
In each case the name is calculated by taking the SHA1 hash of the
|
||||||
"tree", "commit", and "tag".
|
contents of the object. The SHA1 hash is a cryptographic hash function.
|
||||||
|
What that means to us is that it is impossible to find two different
|
||||||
|
objects with the same name. This has a number of advantages; among
|
||||||
|
others:
|
||||||
|
|
||||||
A <<def_blob_object,"blob" object>> cannot refer to any other object,
|
- Git can quickly determine whether two objects are identical or not,
|
||||||
and is, as the name implies, a pure storage object containing some
|
just by comparing names.
|
||||||
user data. It is used to actually store the file data, i.e. a blob
|
- Since object names are computed the same way in ever repository, the
|
||||||
object is associated with some particular version of some file.
|
same content stored in two repositories will always be stored under
|
||||||
|
the same name.
|
||||||
|
- Git can detect errors when it reads an object, by checking that the
|
||||||
|
object's name is still the SHA1 hash of its contents.
|
||||||
|
|
||||||
A <<def_tree_object,"tree" object>> is an object that ties one or more
|
(See <<object-details>> for the details of the object formatting and
|
||||||
"blob" objects into a directory structure. In addition, a tree object
|
SHA1 calculation.)
|
||||||
can refer to other tree objects, thus creating a directory hierarchy.
|
|
||||||
|
|
||||||
A <<def_commit_object,"commit" object>> ties such directory hierarchies
|
There are four different types of objects: "blob", "tree", "commit", and
|
||||||
together into a <<def_DAG,directed acyclic graph>> of revisions - each
|
"tag".
|
||||||
"commit" is associated with exactly one tree (the directory hierarchy at
|
|
||||||
the time of the commit). In addition, a "commit" refers to one or more
|
|
||||||
"parent" commit objects that describe the history of how we arrived at
|
|
||||||
that directory hierarchy.
|
|
||||||
|
|
||||||
As a special case, a commit object with no parents is called the "root"
|
- A <<def_blob_object,"blob" object>> is used to store file data.
|
||||||
commit, and is the point of an initial project commit. Each project
|
- A <<def_tree_object,"tree" object>> is an object that ties one or more
|
||||||
must have at least one root, and while you can tie several different
|
"blob" objects into a directory structure. In addition, a tree object
|
||||||
root objects together into one project by creating a commit object which
|
can refer to other tree objects, thus creating a directory hierarchy.
|
||||||
has two or more separate roots as its ultimate parents, that's probably
|
- A <<def_commit_object,"commit" object>> ties such directory hierarchies
|
||||||
just going to confuse people. So aim for the notion of "one root object
|
together into a <<def_DAG,directed acyclic graph>> of revisions - each
|
||||||
per project", even if git itself does not enforce that.
|
commit contains the object name of exactly one tree designating the
|
||||||
|
directory hierarchy at the time of the commit. In addition, a commit
|
||||||
A <<def_tag_object,"tag" object>> symbolically identifies and can be
|
refers to "parent" commit objects that describe the history of how we
|
||||||
used to sign other objects. It contains the identifier and type of
|
arrived at that directory hierarchy.
|
||||||
another object, a symbolic name (of course!) and, optionally, a
|
- A <<def_tag_object,"tag" object>> symbolically identifies and can be
|
||||||
signature.
|
used to sign other objects. It contains the object name and type of
|
||||||
|
another object, a symbolic name (of course!) and, optionally, a
|
||||||
|
signature.
|
||||||
|
|
||||||
The object types in some more detail:
|
The object types in some more detail:
|
||||||
|
|
||||||
@ -2770,109 +2768,142 @@ The object types in some more detail:
|
|||||||
Commit Object
|
Commit Object
|
||||||
~~~~~~~~~~~~~
|
~~~~~~~~~~~~~
|
||||||
|
|
||||||
The "commit" object is an object that introduces the notion of
|
The "commit" object links a physical state of a tree with a description
|
||||||
history into the picture. In contrast to the other objects, it
|
of how we got there and why. Use the --pretty=raw option to
|
||||||
doesn't just describe the physical state of a tree, it describes how
|
gitlink:git-show[1] or gitlink:git-log[1] to examine your favorite
|
||||||
we got there, and why.
|
commit:
|
||||||
|
|
||||||
A "commit" is defined by the tree-object that it results in, the
|
------------------------------------------------
|
||||||
parent commits (zero, one or more) that led up to that point, and a
|
$ git show -s --pretty=raw 2be7fcb476
|
||||||
comment on what happened. Again, a commit is not trusted per se:
|
commit 2be7fcb4764f2dbcee52635b91fedb1b3dcf7ab4
|
||||||
the contents are well-defined and "safe" due to the cryptographically
|
tree fb3a8bdd0ceddd019615af4d57a53f43d8cee2bf
|
||||||
strong signatures at all levels, but there is no reason to believe
|
parent 257a84d9d02e90447b149af58b271c19405edb6a
|
||||||
that the tree is "good" or that the merge information makes sense.
|
author Dave Watson <dwatson@mimvista.com> 1187576872 -0400
|
||||||
The parents do not have to actually have any relationship with the
|
committer Junio C Hamano <gitster@pobox.com> 1187591163 -0700
|
||||||
result, for example.
|
|
||||||
|
|
||||||
Note on commits: unlike some SCM's, commits do not contain
|
Fix misspelling of 'suppress' in docs
|
||||||
rename information or file mode change information. All of that is
|
|
||||||
implicit in the trees involved (the result tree, and the result trees
|
|
||||||
of the parents), and describing that makes no sense in this idiotic
|
|
||||||
file manager.
|
|
||||||
|
|
||||||
A commit is created with gitlink:git-commit-tree[1] and
|
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
||||||
its data can be accessed by gitlink:git-cat-file[1].
|
------------------------------------------------
|
||||||
|
|
||||||
|
As you can see, a commit is defined by:
|
||||||
|
|
||||||
|
- a tree: The SHA1 name of a tree object (as defined below), representing
|
||||||
|
the contents of a directory at a certain point in time.
|
||||||
|
- parent(s): The SHA1 name of some number of commits which represent the
|
||||||
|
immediately prevoius step(s) in the history of the project. The
|
||||||
|
example above has one parent; merge commits may have more than
|
||||||
|
one. A commit with no parents is called a "root" commit, and
|
||||||
|
represents the initial revision of a project. Each project must have
|
||||||
|
at least one root. A project can also have multiple roots, though
|
||||||
|
that isn't common (or necessarily a good idea).
|
||||||
|
- an author: The name of the person responsible for this change, together
|
||||||
|
with its date.
|
||||||
|
- a committer: The name of the person who actually created the commit,
|
||||||
|
with the date it was done. This may be different from the author, for
|
||||||
|
example, if the author was someone who wrote a patch and emailed it
|
||||||
|
to the person who used it to create the commit.
|
||||||
|
- a comment describing this commit.
|
||||||
|
|
||||||
|
Note that a commit does not itself contain any information about what
|
||||||
|
actually changed; all changes are calculated by comparing the contents
|
||||||
|
of the tree referred to by this commit with the trees associated with
|
||||||
|
its parents. In particular, git does not attempt to record file renames
|
||||||
|
explicitly, though it can identify cases where the existence of the same
|
||||||
|
file data at changing paths suggests a rename. (See, for example, the
|
||||||
|
-M option to gitlink:git-diff[1]).
|
||||||
|
|
||||||
|
A commit is usually created by gitlink:git-commit[1], which creates a
|
||||||
|
commit whose parent is normally the current HEAD, and whose tree is
|
||||||
|
taken from the content currently stored in the index.
|
||||||
|
|
||||||
[[tree-object]]
|
[[tree-object]]
|
||||||
Tree Object
|
Tree Object
|
||||||
~~~~~~~~~~~
|
~~~~~~~~~~~
|
||||||
|
|
||||||
The next hierarchical object type is the "tree" object. A tree object
|
The ever-versatile gitlink:git-show[1] command can also be used to
|
||||||
is a list of mode/name/blob data, sorted by name. Alternatively, the
|
examine tree objects, but gitlink:git-ls-tree[1] will give you more
|
||||||
mode data may specify a directory mode, in which case instead of
|
details:
|
||||||
naming a blob, that name is associated with another TREE object.
|
|
||||||
|
|
||||||
Like the "blob" object, a tree object is uniquely determined by the
|
------------------------------------------------
|
||||||
set contents, and so two separate but identical trees will always
|
$ git ls-tree fb3a8bdd0ce
|
||||||
share the exact same object. This is true at all levels, i.e. it's
|
100644 blob 63c918c667fa005ff12ad89437f2fdc80926e21c .gitignore
|
||||||
true for a "leaf" tree (which does not refer to any other trees, only
|
100644 blob 5529b198e8d14decbe4ad99db3f7fb632de0439d .mailmap
|
||||||
blobs) as well as for a whole subdirectory.
|
100644 blob 6ff87c4664981e4397625791c8ea3bbb5f2279a3 COPYING
|
||||||
|
040000 tree 2fb783e477100ce076f6bf57e4a6f026013dc745 Documentation
|
||||||
|
100755 blob 3c0032cec592a765692234f1cba47dfdcc3a9200 GIT-VERSION-GEN
|
||||||
|
100644 blob 289b046a443c0647624607d471289b2c7dcd470b INSTALL
|
||||||
|
100644 blob 4eb463797adc693dc168b926b6932ff53f17d0b1 Makefile
|
||||||
|
100644 blob 548142c327a6790ff8821d67c2ee1eff7a656b52 README
|
||||||
|
...
|
||||||
|
------------------------------------------------
|
||||||
|
|
||||||
For that reason a "tree" object is just a pure data abstraction: it
|
As you can see, a tree object contains a list of entries, each with a
|
||||||
has no history, no signatures, no verification of validity, except
|
mode, object type, SHA1 name, and name, sorted by name. It represents
|
||||||
that since the contents are again protected by the hash itself, we can
|
the contents of a single directory tree.
|
||||||
trust that the tree is immutable and its contents never change.
|
|
||||||
|
|
||||||
So you can trust the contents of a tree to be valid, the same way you
|
The object type may be a blob, representing the contents of a file, or
|
||||||
can trust the contents of a blob, but you don't know where those
|
another tree, representing the contents of a subdirectory. Since trees
|
||||||
contents 'came' from.
|
and blobs, like all other objects, are named by the SHA1 hash of their
|
||||||
|
contents, two trees have the same SHA1 name if and only if their
|
||||||
|
contents (including, recursively, the contents of all subdirectories)
|
||||||
|
are identical. This allows git to quickly determine the differences
|
||||||
|
between two related tree objects, since it can ignore any entries with
|
||||||
|
identical object names.
|
||||||
|
|
||||||
Side note on trees: since a "tree" object is a sorted list of
|
(Note: in the presence of submodules, trees may also have commits as
|
||||||
"filename+content", you can create a diff between two trees without
|
entries. See gitlink:git-submodule[1] and gitlink:gitmodules.txt[1]
|
||||||
actually having to unpack two trees. Just ignore all common parts,
|
for partial documentation.)
|
||||||
and your diff will look right. In other words, you can effectively
|
|
||||||
(and efficiently) tell the difference between any two random trees by
|
|
||||||
O(n) where "n" is the size of the difference, rather than the size of
|
|
||||||
the tree.
|
|
||||||
|
|
||||||
Side note 2 on trees: since the name of a "blob" depends entirely and
|
Note that the files all have mode 644 or 755: git actually only pays
|
||||||
exclusively on its contents (i.e. there are no names or permissions
|
attention to the executable bit.
|
||||||
involved), you can see trivial renames or permission changes by
|
|
||||||
noticing that the blob stayed the same. However, renames with data
|
|
||||||
changes need a smarter "diff" implementation.
|
|
||||||
|
|
||||||
A tree is created with gitlink:git-write-tree[1] and
|
|
||||||
its data can be accessed by gitlink:git-ls-tree[1].
|
|
||||||
Two trees can be compared with gitlink:git-diff-tree[1].
|
|
||||||
|
|
||||||
[[blob-object]]
|
[[blob-object]]
|
||||||
Blob Object
|
Blob Object
|
||||||
~~~~~~~~~~~
|
~~~~~~~~~~~
|
||||||
|
|
||||||
A "blob" object is nothing but a binary blob of data, and doesn't
|
You can use gitlink:git-show[1] to examine the contents of a blob; take,
|
||||||
refer to anything else. There is no signature or any other
|
for example, the blob in the entry for "COPYING" from the tree above:
|
||||||
verification of the data, so while the object is consistent (it 'is'
|
|
||||||
indexed by its sha1 hash, so the data itself is certainly correct), it
|
|
||||||
has absolutely no other attributes. No name associations, no
|
|
||||||
permissions. It is purely a blob of data (i.e. normally "file
|
|
||||||
contents").
|
|
||||||
|
|
||||||
In particular, since the blob is entirely defined by its data, if two
|
------------------------------------------------
|
||||||
files in a directory tree (or in multiple different versions of the
|
$ git show 6ff87c4664
|
||||||
repository) have the same contents, they will share the same blob
|
|
||||||
object. The object is totally independent of its location in the
|
|
||||||
directory tree, and renaming a file does not change the object that
|
|
||||||
file is associated with in any way.
|
|
||||||
|
|
||||||
A blob is typically created when gitlink:git-update-index[1]
|
Note that the only valid version of the GPL as far as this project
|
||||||
is run, and its data can be accessed by gitlink:git-cat-file[1].
|
is concerned is _this_ particular version of the license (ie v2, not
|
||||||
|
v2.2 or v3.x or whatever), unless explicitly otherwise stated.
|
||||||
|
...
|
||||||
|
------------------------------------------------
|
||||||
|
|
||||||
|
A "blob" object is nothing but a binary blob of data. It doesn't refer
|
||||||
|
to anything else or have attributes of any kind.
|
||||||
|
|
||||||
|
Since the blob is entirely defined by its data, if two files in a
|
||||||
|
directory tree (or in multiple different versions of the repository)
|
||||||
|
have the same contents, they will share the same blob object. The object
|
||||||
|
is totally independent of its location in the directory tree, and
|
||||||
|
renaming a file does not change the object that file is associated with.
|
||||||
|
|
||||||
|
Note that any tree or blob object can be examined using
|
||||||
|
gitlink:git-show[1] with the <revision>:<path> syntax. This can
|
||||||
|
sometimes be useful for browsing the contents of a tree that is not
|
||||||
|
currently checked out.
|
||||||
|
|
||||||
[[trust]]
|
[[trust]]
|
||||||
Trust
|
Trust
|
||||||
~~~~~
|
~~~~~
|
||||||
|
|
||||||
An aside on the notion of "trust". Trust is really outside the scope
|
If you receive the SHA1 name of a blob from one source, and its contents
|
||||||
of "git", but it's worth noting a few things. First off, since
|
from another (possibly untrusted) source, you can still trust that those
|
||||||
everything is hashed with SHA1, you 'can' trust that an object is
|
contents are correct as long as the SHA1 name agrees. This is because
|
||||||
intact and has not been messed with by external sources. So the name
|
the SHA1 is designed so that it is infeasible to find different contents
|
||||||
of an object uniquely identifies a known state - just not a state that
|
that produce the same hash.
|
||||||
you may want to trust.
|
|
||||||
|
|
||||||
Furthermore, since the SHA1 signature of a commit refers to the
|
Similarly, you need only trust the SHA1 name of a top-level tree object
|
||||||
SHA1 signatures of the tree it is associated with and the signatures
|
to trust the contents of the entire directory that it refers to, and if
|
||||||
of the parent, a single named commit specifies uniquely a whole set
|
you receive the SHA1 name of a commit from a trusted source, then you
|
||||||
of history, with full contents. You can't later fake any step of the
|
can easily verify the entire history of commits reachable through
|
||||||
way once you have the name of a commit.
|
parents of that commit, and all of those contents of the trees referred
|
||||||
|
to by those commits.
|
||||||
|
|
||||||
So to introduce some real trust in the system, the only thing you need
|
So to introduce some real trust in the system, the only thing you need
|
||||||
to do is to digitally sign just 'one' special note, which includes the
|
to do is to digitally sign just 'one' special note, which includes the
|
||||||
@ -2891,23 +2922,31 @@ To assist in this, git also provides the tag object...
|
|||||||
Tag Object
|
Tag Object
|
||||||
~~~~~~~~~~
|
~~~~~~~~~~
|
||||||
|
|
||||||
Git provides the "tag" object to simplify creating, managing and
|
A tag object contains an object, object type, tag name, the name of the
|
||||||
exchanging symbolic and signed tokens. The "tag" object at its
|
person ("tagger") who created the tag, and a message, which may contain
|
||||||
simplest simply symbolically identifies another object by containing
|
a signature, as can be seen using the gitlink:git-cat-file[1]:
|
||||||
the sha1, type and symbolic name.
|
|
||||||
|
|
||||||
However it can optionally contain additional signature information
|
------------------------------------------------
|
||||||
(which git doesn't care about as long as there's less than 8k of
|
$ git cat-file tag v1.5.0
|
||||||
it). This can then be verified externally to git.
|
object 437b1b20df4b356c9342dac8d38849f24ef44f27
|
||||||
|
type commit
|
||||||
|
tag v1.5.0
|
||||||
|
tagger Junio C Hamano <junkio@cox.net> 1171411200 +0000
|
||||||
|
|
||||||
Note that despite the tag features, "git" itself only handles content
|
GIT 1.5.0
|
||||||
integrity; the trust framework (and signature provision and
|
-----BEGIN PGP SIGNATURE-----
|
||||||
verification) has to come from outside.
|
Version: GnuPG v1.4.6 (GNU/Linux)
|
||||||
|
|
||||||
A tag is created with gitlink:git-mktag[1],
|
iD8DBQBF0lGqwMbZpPMRm5oRAuRiAJ9ohBLd7s2kqjkKlq1qqC57SbnmzQCdG4ui
|
||||||
its data can be accessed by gitlink:git-cat-file[1],
|
nLE/L9aUXdWeTFPron96DLA=
|
||||||
and the signature can be verified by
|
=2E+0
|
||||||
gitlink:git-verify-tag[1].
|
-----END PGP SIGNATURE-----
|
||||||
|
------------------------------------------------
|
||||||
|
|
||||||
|
See the gitlink:git-tag[1] command to learn how to create and verify tag
|
||||||
|
objects. (Note that gitlink:git-tag[1] can also be used to create
|
||||||
|
"lightweight tags", which are not tag objects at all, but just simple
|
||||||
|
references in .git/refs/tags/).
|
||||||
|
|
||||||
|
|
||||||
[[the-index]]
|
[[the-index]]
|
||||||
@ -2978,6 +3017,24 @@ scripts using a smaller core of low-level git commands. These can still
|
|||||||
be useful when doing unusual things with git, or just as a way to
|
be useful when doing unusual things with git, or just as a way to
|
||||||
understand its inner workings.
|
understand its inner workings.
|
||||||
|
|
||||||
|
[[object-manipulation]]
|
||||||
|
Object access and manipulation
|
||||||
|
------------------------------
|
||||||
|
|
||||||
|
The gitlink:git-cat-file[1] command can show the contents of any object,
|
||||||
|
though the higher-level gitlink:git-show[1] is usually more useful.
|
||||||
|
|
||||||
|
The gitlink:git-commit-tree[1] command allows constructing commits with
|
||||||
|
arbitrary parents and trees.
|
||||||
|
|
||||||
|
A tree can be created with gitlink:git-write-tree[1] and its data can be
|
||||||
|
accessed by gitlink:git-ls-tree[1]. Two trees can be compared with
|
||||||
|
gitlink:git-diff-tree[1].
|
||||||
|
|
||||||
|
A tag is created with gitlink:git-mktag[1], and the signature can be
|
||||||
|
verified by gitlink:git-verify-tag[1], though it is normally simpler to
|
||||||
|
use gitlink:git-tag[1] for both.
|
||||||
|
|
||||||
[[the-workflow]]
|
[[the-workflow]]
|
||||||
The Workflow
|
The Workflow
|
||||||
------------
|
------------
|
||||||
|
Reference in New Issue
Block a user