 63903d0e4e
			
		
	
	63903d0e4e
	
	
	
		
			
			A typofix to the documentation of a feature already in the release. * nd/split-index: index-format.txt: add a missing closing quote
		
			
				
	
	
		
			236 lines
		
	
	
		
			7.9 KiB
		
	
	
	
		
			Plaintext
		
	
	
	
	
	
			
		
		
	
	
			236 lines
		
	
	
		
			7.9 KiB
		
	
	
	
		
			Plaintext
		
	
	
	
	
	
| Git index format
 | |
| ================
 | |
| 
 | |
| == The Git index file has the following format
 | |
| 
 | |
|   All binary numbers are in network byte order. Version 2 is described
 | |
|   here unless stated otherwise.
 | |
| 
 | |
|    - A 12-byte header consisting of
 | |
| 
 | |
|      4-byte signature:
 | |
|        The signature is { 'D', 'I', 'R', 'C' } (stands for "dircache")
 | |
| 
 | |
|      4-byte version number:
 | |
|        The current supported versions are 2, 3 and 4.
 | |
| 
 | |
|      32-bit number of index entries.
 | |
| 
 | |
|    - A number of sorted index entries (see below).
 | |
| 
 | |
|    - Extensions
 | |
| 
 | |
|      Extensions are identified by signature. Optional extensions can
 | |
|      be ignored if Git does not understand them.
 | |
| 
 | |
|      Git currently supports cached tree and resolve undo extensions.
 | |
| 
 | |
|      4-byte extension signature. If the first byte is 'A'..'Z' the
 | |
|      extension is optional and can be ignored.
 | |
| 
 | |
|      32-bit size of the extension
 | |
| 
 | |
|      Extension data
 | |
| 
 | |
|    - 160-bit SHA-1 over the content of the index file before this
 | |
|      checksum.
 | |
| 
 | |
| == Index entry
 | |
| 
 | |
|   Index entries are sorted in ascending order on the name field,
 | |
|   interpreted as a string of unsigned bytes (i.e. memcmp() order, no
 | |
|   localization, no special casing of directory separator '/'). Entries
 | |
|   with the same name are sorted by their stage field.
 | |
| 
 | |
|   32-bit ctime seconds, the last time a file's metadata changed
 | |
|     this is stat(2) data
 | |
| 
 | |
|   32-bit ctime nanosecond fractions
 | |
|     this is stat(2) data
 | |
| 
 | |
|   32-bit mtime seconds, the last time a file's data changed
 | |
|     this is stat(2) data
 | |
| 
 | |
|   32-bit mtime nanosecond fractions
 | |
|     this is stat(2) data
 | |
| 
 | |
|   32-bit dev
 | |
|     this is stat(2) data
 | |
| 
 | |
|   32-bit ino
 | |
|     this is stat(2) data
 | |
| 
 | |
|   32-bit mode, split into (high to low bits)
 | |
| 
 | |
|     4-bit object type
 | |
|       valid values in binary are 1000 (regular file), 1010 (symbolic link)
 | |
|       and 1110 (gitlink)
 | |
| 
 | |
|     3-bit unused
 | |
| 
 | |
|     9-bit unix permission. Only 0755 and 0644 are valid for regular files.
 | |
|     Symbolic links and gitlinks have value 0 in this field.
 | |
| 
 | |
|   32-bit uid
 | |
|     this is stat(2) data
 | |
| 
 | |
|   32-bit gid
 | |
|     this is stat(2) data
 | |
| 
 | |
|   32-bit file size
 | |
|     This is the on-disk size from stat(2), truncated to 32-bit.
 | |
| 
 | |
|   160-bit SHA-1 for the represented object
 | |
| 
 | |
|   A 16-bit 'flags' field split into (high to low bits)
 | |
| 
 | |
|     1-bit assume-valid flag
 | |
| 
 | |
|     1-bit extended flag (must be zero in version 2)
 | |
| 
 | |
|     2-bit stage (during merge)
 | |
| 
 | |
|     12-bit name length if the length is less than 0xFFF; otherwise 0xFFF
 | |
|     is stored in this field.
 | |
| 
 | |
|   (Version 3 or later) A 16-bit field, only applicable if the
 | |
|   "extended flag" above is 1, split into (high to low bits).
 | |
| 
 | |
|     1-bit reserved for future
 | |
| 
 | |
|     1-bit skip-worktree flag (used by sparse checkout)
 | |
| 
 | |
|     1-bit intent-to-add flag (used by "git add -N")
 | |
| 
 | |
|     13-bit unused, must be zero
 | |
| 
 | |
|   Entry path name (variable length) relative to top level directory
 | |
|     (without leading slash). '/' is used as path separator. The special
 | |
|     path components ".", ".." and ".git" (without quotes) are disallowed.
 | |
|     Trailing slash is also disallowed.
 | |
| 
 | |
|     The exact encoding is undefined, but the '.' and '/' characters
 | |
|     are encoded in 7-bit ASCII and the encoding cannot contain a NUL
 | |
|     byte (iow, this is a UNIX pathname).
 | |
| 
 | |
|   (Version 4) In version 4, the entry path name is prefix-compressed
 | |
|     relative to the path name for the previous entry (the very first
 | |
|     entry is encoded as if the path name for the previous entry is an
 | |
|     empty string).  At the beginning of an entry, an integer N in the
 | |
|     variable width encoding (the same encoding as the offset is encoded
 | |
|     for OFS_DELTA pack entries; see pack-format.txt) is stored, followed
 | |
|     by a NUL-terminated string S.  Removing N bytes from the end of the
 | |
|     path name for the previous entry, and replacing it with the string S
 | |
|     yields the path name for this entry.
 | |
| 
 | |
|   1-8 nul bytes as necessary to pad the entry to a multiple of eight bytes
 | |
|   while keeping the name NUL-terminated.
 | |
| 
 | |
|   (Version 4) In version 4, the padding after the pathname does not
 | |
|   exist.
 | |
| 
 | |
|   Interpretation of index entries in split index mode is completely
 | |
|   different. See below for details.
 | |
| 
 | |
| == Extensions
 | |
| 
 | |
| === Cached tree
 | |
| 
 | |
|   Cached tree extension contains pre-computed hashes for trees that can
 | |
|   be derived from the index. It helps speed up tree object generation
 | |
|   from index for a new commit.
 | |
| 
 | |
|   When a path is updated in index, the path must be invalidated and
 | |
|   removed from tree cache.
 | |
| 
 | |
|   The signature for this extension is { 'T', 'R', 'E', 'E' }.
 | |
| 
 | |
|   A series of entries fill the entire extension; each of which
 | |
|   consists of:
 | |
| 
 | |
|   - NUL-terminated path component (relative to its parent directory);
 | |
| 
 | |
|   - ASCII decimal number of entries in the index that is covered by the
 | |
|     tree this entry represents (entry_count);
 | |
| 
 | |
|   - A space (ASCII 32);
 | |
| 
 | |
|   - ASCII decimal number that represents the number of subtrees this
 | |
|     tree has;
 | |
| 
 | |
|   - A newline (ASCII 10); and
 | |
| 
 | |
|   - 160-bit object name for the object that would result from writing
 | |
|     this span of index as a tree.
 | |
| 
 | |
|   An entry can be in an invalidated state and is represented by having
 | |
|   a negative number in the entry_count field. In this case, there is no
 | |
|   object name and the next entry starts immediately after the newline.
 | |
|   When writing an invalid entry, -1 should always be used as entry_count.
 | |
| 
 | |
|   The entries are written out in the top-down, depth-first order.  The
 | |
|   first entry represents the root level of the repository, followed by the
 | |
|   first subtree---let's call this A---of the root level (with its name
 | |
|   relative to the root level), followed by the first subtree of A (with
 | |
|   its name relative to A), ...
 | |
| 
 | |
| === Resolve undo
 | |
| 
 | |
|   A conflict is represented in the index as a set of higher stage entries.
 | |
|   When a conflict is resolved (e.g. with "git add path"), these higher
 | |
|   stage entries will be removed and a stage-0 entry with proper resolution
 | |
|   is added.
 | |
| 
 | |
|   When these higher stage entries are removed, they are saved in the
 | |
|   resolve undo extension, so that conflicts can be recreated (e.g. with
 | |
|   "git checkout -m"), in case users want to redo a conflict resolution
 | |
|   from scratch.
 | |
| 
 | |
|   The signature for this extension is { 'R', 'E', 'U', 'C' }.
 | |
| 
 | |
|   A series of entries fill the entire extension; each of which
 | |
|   consists of:
 | |
| 
 | |
|   - NUL-terminated pathname the entry describes (relative to the root of
 | |
|     the repository, i.e. full pathname);
 | |
| 
 | |
|   - Three NUL-terminated ASCII octal numbers, entry mode of entries in
 | |
|     stage 1 to 3 (a missing stage is represented by "0" in this field);
 | |
|     and
 | |
| 
 | |
|   - At most three 160-bit object names of the entry in stages from 1 to 3
 | |
|     (nothing is written for a missing stage).
 | |
| 
 | |
| === Split index
 | |
| 
 | |
|   In split index mode, the majority of index entries could be stored
 | |
|   in a separate file. This extension records the changes to be made on
 | |
|   top of that to produce the final index.
 | |
| 
 | |
|   The signature for this extension is { 'l', 'i', 'n', 'k' }.
 | |
| 
 | |
|   The extension consists of:
 | |
| 
 | |
|   - 160-bit SHA-1 of the shared index file. The shared index file path
 | |
|     is $GIT_DIR/sharedindex.<SHA-1>. If all 160 bits are zero, the
 | |
|     index does not require a shared index file.
 | |
| 
 | |
|   - An ewah-encoded delete bitmap, each bit represents an entry in the
 | |
|     shared index. If a bit is set, its corresponding entry in the
 | |
|     shared index will be removed from the final index.  Note, because
 | |
|     a delete operation changes index entry positions, but we do need
 | |
|     original positions in replace phase, it's best to just mark
 | |
|     entries for removal, then do a mass deletion after replacement.
 | |
| 
 | |
|   - An ewah-encoded replace bitmap, each bit represents an entry in
 | |
|     the shared index. If a bit is set, its corresponding entry in the
 | |
|     shared index will be replaced with an entry in this index
 | |
|     file. All replaced entries are stored in sorted order in this
 | |
|     index. The first "1" bit in the replace bitmap corresponds to the
 | |
|     first index entry, the second "1" bit to the second entry and so
 | |
|     on. Replaced entries may have empty path names to save space.
 | |
| 
 | |
|   The remaining index entries after replaced ones will be added to the
 | |
|   final index. These added entries are also sorted by entry name then
 | |
|   stage.
 |