The index format does not currently allow for sparse directory entries. This violates some expectations that older versions of Git or third-party tools might not understand. We need an indicator inside the index file to warn these tools to not interact with a sparse index unless they are aware of sparse directory entries. Add a new _required_ index extension, 'sdir', that indicates that the index may contain sparse directory entries. This allows us to continue to use the differences in index formats 2, 3, and 4 before we create a new index version 5 in a later change. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
		
			
				
	
	
		
			407 lines
		
	
	
		
			15 KiB
		
	
	
	
		
			Plaintext
		
	
	
	
	
	
			
		
		
	
	
			407 lines
		
	
	
		
			15 KiB
		
	
	
	
		
			Plaintext
		
	
	
	
	
	
Git index format
 | 
						|
================
 | 
						|
 | 
						|
== The Git index file has the following format
 | 
						|
 | 
						|
  All binary numbers are in network byte order.
 | 
						|
  In a repository using the traditional SHA-1, checksums and object IDs
 | 
						|
  (object names) mentioned below are all computed using SHA-1.  Similarly,
 | 
						|
  in SHA-256 repositories, these values are computed using SHA-256.
 | 
						|
  Version 2 is described here unless stated otherwise.
 | 
						|
 | 
						|
   - A 12-byte header consisting of
 | 
						|
 | 
						|
     4-byte signature:
 | 
						|
       The signature is { 'D', 'I', 'R', 'C' } (stands for "dircache")
 | 
						|
 | 
						|
     4-byte version number:
 | 
						|
       The current supported versions are 2, 3 and 4.
 | 
						|
 | 
						|
     32-bit number of index entries.
 | 
						|
 | 
						|
   - A number of sorted index entries (see below).
 | 
						|
 | 
						|
   - Extensions
 | 
						|
 | 
						|
     Extensions are identified by signature. Optional extensions can
 | 
						|
     be ignored if Git does not understand them.
 | 
						|
 | 
						|
     Git currently supports cache tree and resolve undo extensions.
 | 
						|
 | 
						|
     4-byte extension signature. If the first byte is 'A'..'Z' the
 | 
						|
     extension is optional and can be ignored.
 | 
						|
 | 
						|
     32-bit size of the extension
 | 
						|
 | 
						|
     Extension data
 | 
						|
 | 
						|
   - Hash checksum over the content of the index file before this checksum.
 | 
						|
 | 
						|
== Index entry
 | 
						|
 | 
						|
  Index entries are sorted in ascending order on the name field,
 | 
						|
  interpreted as a string of unsigned bytes (i.e. memcmp() order, no
 | 
						|
  localization, no special casing of directory separator '/'). Entries
 | 
						|
  with the same name are sorted by their stage field.
 | 
						|
 | 
						|
  An index entry typically represents a file. However, if sparse-checkout
 | 
						|
  is enabled in cone mode (`core.sparseCheckoutCone` is enabled) and the
 | 
						|
  `extensions.sparseIndex` extension is enabled, then the index may
 | 
						|
  contain entries for directories outside of the sparse-checkout definition.
 | 
						|
  These entries have mode `040000`, include the `SKIP_WORKTREE` bit, and
 | 
						|
  the path ends in a directory separator.
 | 
						|
 | 
						|
  32-bit ctime seconds, the last time a file's metadata changed
 | 
						|
    this is stat(2) data
 | 
						|
 | 
						|
  32-bit ctime nanosecond fractions
 | 
						|
    this is stat(2) data
 | 
						|
 | 
						|
  32-bit mtime seconds, the last time a file's data changed
 | 
						|
    this is stat(2) data
 | 
						|
 | 
						|
  32-bit mtime nanosecond fractions
 | 
						|
    this is stat(2) data
 | 
						|
 | 
						|
  32-bit dev
 | 
						|
    this is stat(2) data
 | 
						|
 | 
						|
  32-bit ino
 | 
						|
    this is stat(2) data
 | 
						|
 | 
						|
  32-bit mode, split into (high to low bits)
 | 
						|
 | 
						|
    4-bit object type
 | 
						|
      valid values in binary are 1000 (regular file), 1010 (symbolic link)
 | 
						|
      and 1110 (gitlink)
 | 
						|
 | 
						|
    3-bit unused
 | 
						|
 | 
						|
    9-bit unix permission. Only 0755 and 0644 are valid for regular files.
 | 
						|
    Symbolic links and gitlinks have value 0 in this field.
 | 
						|
 | 
						|
  32-bit uid
 | 
						|
    this is stat(2) data
 | 
						|
 | 
						|
  32-bit gid
 | 
						|
    this is stat(2) data
 | 
						|
 | 
						|
  32-bit file size
 | 
						|
    This is the on-disk size from stat(2), truncated to 32-bit.
 | 
						|
 | 
						|
  Object name for the represented object
 | 
						|
 | 
						|
  A 16-bit 'flags' field split into (high to low bits)
 | 
						|
 | 
						|
    1-bit assume-valid flag
 | 
						|
 | 
						|
    1-bit extended flag (must be zero in version 2)
 | 
						|
 | 
						|
    2-bit stage (during merge)
 | 
						|
 | 
						|
    12-bit name length if the length is less than 0xFFF; otherwise 0xFFF
 | 
						|
    is stored in this field.
 | 
						|
 | 
						|
  (Version 3 or later) A 16-bit field, only applicable if the
 | 
						|
  "extended flag" above is 1, split into (high to low bits).
 | 
						|
 | 
						|
    1-bit reserved for future
 | 
						|
 | 
						|
    1-bit skip-worktree flag (used by sparse checkout)
 | 
						|
 | 
						|
    1-bit intent-to-add flag (used by "git add -N")
 | 
						|
 | 
						|
    13-bit unused, must be zero
 | 
						|
 | 
						|
  Entry path name (variable length) relative to top level directory
 | 
						|
    (without leading slash). '/' is used as path separator. The special
 | 
						|
    path components ".", ".." and ".git" (without quotes) are disallowed.
 | 
						|
    Trailing slash is also disallowed.
 | 
						|
 | 
						|
    The exact encoding is undefined, but the '.' and '/' characters
 | 
						|
    are encoded in 7-bit ASCII and the encoding cannot contain a NUL
 | 
						|
    byte (iow, this is a UNIX pathname).
 | 
						|
 | 
						|
  (Version 4) In version 4, the entry path name is prefix-compressed
 | 
						|
    relative to the path name for the previous entry (the very first
 | 
						|
    entry is encoded as if the path name for the previous entry is an
 | 
						|
    empty string).  At the beginning of an entry, an integer N in the
 | 
						|
    variable width encoding (the same encoding as the offset is encoded
 | 
						|
    for OFS_DELTA pack entries; see pack-format.txt) is stored, followed
 | 
						|
    by a NUL-terminated string S.  Removing N bytes from the end of the
 | 
						|
    path name for the previous entry, and replacing it with the string S
 | 
						|
    yields the path name for this entry.
 | 
						|
 | 
						|
  1-8 nul bytes as necessary to pad the entry to a multiple of eight bytes
 | 
						|
  while keeping the name NUL-terminated.
 | 
						|
 | 
						|
  (Version 4) In version 4, the padding after the pathname does not
 | 
						|
  exist.
 | 
						|
 | 
						|
  Interpretation of index entries in split index mode is completely
 | 
						|
  different. See below for details.
 | 
						|
 | 
						|
== Extensions
 | 
						|
 | 
						|
=== Cache tree
 | 
						|
 | 
						|
  Since the index does not record entries for directories, the cache
 | 
						|
  entries cannot describe tree objects that already exist in the object
 | 
						|
  database for regions of the index that are unchanged from an existing
 | 
						|
  commit. The cache tree extension stores a recursive tree structure that
 | 
						|
  describes the trees that already exist and completely match sections of
 | 
						|
  the cache entries. This speeds up tree object generation from the index
 | 
						|
  for a new commit by only computing the trees that are "new" to that
 | 
						|
  commit. It also assists when comparing the index to another tree, such
 | 
						|
  as `HEAD^{tree}`, since sections of the index can be skipped when a tree
 | 
						|
  comparison demonstrates equality.
 | 
						|
 | 
						|
  The recursive tree structure uses nodes that store a number of cache
 | 
						|
  entries, a list of subnodes, and an object ID (OID). The OID references
 | 
						|
  the existing tree for that node, if it is known to exist. The subnodes
 | 
						|
  correspond to subdirectories that themselves have cache tree nodes. The
 | 
						|
  number of cache entries corresponds to the number of cache entries in
 | 
						|
  the index that describe paths within that tree's directory.
 | 
						|
 | 
						|
  The extension tracks the full directory structure in the cache tree
 | 
						|
  extension, but this is generally smaller than the full cache entry list.
 | 
						|
 | 
						|
  When a path is updated in index, Git invalidates all nodes of the
 | 
						|
  recursive cache tree corresponding to the parent directories of that
 | 
						|
  path. We store these tree nodes as being "invalid" by using "-1" as the
 | 
						|
  number of cache entries. Invalid nodes still store a span of index
 | 
						|
  entries, allowing Git to focus its efforts when reconstructing a full
 | 
						|
  cache tree.
 | 
						|
 | 
						|
  The signature for this extension is { 'T', 'R', 'E', 'E' }.
 | 
						|
 | 
						|
  A series of entries fill the entire extension; each of which
 | 
						|
  consists of:
 | 
						|
 | 
						|
  - NUL-terminated path component (relative to its parent directory);
 | 
						|
 | 
						|
  - ASCII decimal number of entries in the index that is covered by the
 | 
						|
    tree this entry represents (entry_count);
 | 
						|
 | 
						|
  - A space (ASCII 32);
 | 
						|
 | 
						|
  - ASCII decimal number that represents the number of subtrees this
 | 
						|
    tree has;
 | 
						|
 | 
						|
  - A newline (ASCII 10); and
 | 
						|
 | 
						|
  - Object name for the object that would result from writing this span
 | 
						|
    of index as a tree.
 | 
						|
 | 
						|
  An entry can be in an invalidated state and is represented by having
 | 
						|
  a negative number in the entry_count field. In this case, there is no
 | 
						|
  object name and the next entry starts immediately after the newline.
 | 
						|
  When writing an invalid entry, -1 should always be used as entry_count.
 | 
						|
 | 
						|
  The entries are written out in the top-down, depth-first order.  The
 | 
						|
  first entry represents the root level of the repository, followed by the
 | 
						|
  first subtree--let's call this A--of the root level (with its name
 | 
						|
  relative to the root level), followed by the first subtree of A (with
 | 
						|
  its name relative to A), and so on. The specified number of subtrees
 | 
						|
  indicates when the current level of the recursive stack is complete.
 | 
						|
 | 
						|
=== Resolve undo
 | 
						|
 | 
						|
  A conflict is represented in the index as a set of higher stage entries.
 | 
						|
  When a conflict is resolved (e.g. with "git add path"), these higher
 | 
						|
  stage entries will be removed and a stage-0 entry with proper resolution
 | 
						|
  is added.
 | 
						|
 | 
						|
  When these higher stage entries are removed, they are saved in the
 | 
						|
  resolve undo extension, so that conflicts can be recreated (e.g. with
 | 
						|
  "git checkout -m"), in case users want to redo a conflict resolution
 | 
						|
  from scratch.
 | 
						|
 | 
						|
  The signature for this extension is { 'R', 'E', 'U', 'C' }.
 | 
						|
 | 
						|
  A series of entries fill the entire extension; each of which
 | 
						|
  consists of:
 | 
						|
 | 
						|
  - NUL-terminated pathname the entry describes (relative to the root of
 | 
						|
    the repository, i.e. full pathname);
 | 
						|
 | 
						|
  - Three NUL-terminated ASCII octal numbers, entry mode of entries in
 | 
						|
    stage 1 to 3 (a missing stage is represented by "0" in this field);
 | 
						|
    and
 | 
						|
 | 
						|
  - At most three object names of the entry in stages from 1 to 3
 | 
						|
    (nothing is written for a missing stage).
 | 
						|
 | 
						|
=== Split index
 | 
						|
 | 
						|
  In split index mode, the majority of index entries could be stored
 | 
						|
  in a separate file. This extension records the changes to be made on
 | 
						|
  top of that to produce the final index.
 | 
						|
 | 
						|
  The signature for this extension is { 'l', 'i', 'n', 'k' }.
 | 
						|
 | 
						|
  The extension consists of:
 | 
						|
 | 
						|
  - Hash of the shared index file. The shared index file path
 | 
						|
    is $GIT_DIR/sharedindex.<hash>. If all bits are zero, the
 | 
						|
    index does not require a shared index file.
 | 
						|
 | 
						|
  - An ewah-encoded delete bitmap, each bit represents an entry in the
 | 
						|
    shared index. If a bit is set, its corresponding entry in the
 | 
						|
    shared index will be removed from the final index.  Note, because
 | 
						|
    a delete operation changes index entry positions, but we do need
 | 
						|
    original positions in replace phase, it's best to just mark
 | 
						|
    entries for removal, then do a mass deletion after replacement.
 | 
						|
 | 
						|
  - An ewah-encoded replace bitmap, each bit represents an entry in
 | 
						|
    the shared index. If a bit is set, its corresponding entry in the
 | 
						|
    shared index will be replaced with an entry in this index
 | 
						|
    file. All replaced entries are stored in sorted order in this
 | 
						|
    index. The first "1" bit in the replace bitmap corresponds to the
 | 
						|
    first index entry, the second "1" bit to the second entry and so
 | 
						|
    on. Replaced entries may have empty path names to save space.
 | 
						|
 | 
						|
  The remaining index entries after replaced ones will be added to the
 | 
						|
  final index. These added entries are also sorted by entry name then
 | 
						|
  stage.
 | 
						|
 | 
						|
== Untracked cache
 | 
						|
 | 
						|
  Untracked cache saves the untracked file list and necessary data to
 | 
						|
  verify the cache. The signature for this extension is { 'U', 'N',
 | 
						|
  'T', 'R' }.
 | 
						|
 | 
						|
  The extension starts with
 | 
						|
 | 
						|
  - A sequence of NUL-terminated strings, preceded by the size of the
 | 
						|
    sequence in variable width encoding. Each string describes the
 | 
						|
    environment where the cache can be used.
 | 
						|
 | 
						|
  - Stat data of $GIT_DIR/info/exclude. See "Index entry" section from
 | 
						|
    ctime field until "file size".
 | 
						|
 | 
						|
  - Stat data of core.excludesFile
 | 
						|
 | 
						|
  - 32-bit dir_flags (see struct dir_struct)
 | 
						|
 | 
						|
  - Hash of $GIT_DIR/info/exclude. A null hash means the file
 | 
						|
    does not exist.
 | 
						|
 | 
						|
  - Hash of core.excludesFile. A null hash means the file does
 | 
						|
    not exist.
 | 
						|
 | 
						|
  - NUL-terminated string of per-dir exclude file name. This usually
 | 
						|
    is ".gitignore".
 | 
						|
 | 
						|
  - The number of following directory blocks, variable width
 | 
						|
    encoding. If this number is zero, the extension ends here with a
 | 
						|
    following NUL.
 | 
						|
 | 
						|
  - A number of directory blocks in depth-first-search order, each
 | 
						|
    consists of
 | 
						|
 | 
						|
    - The number of untracked entries, variable width encoding.
 | 
						|
 | 
						|
    - The number of sub-directory blocks, variable width encoding.
 | 
						|
 | 
						|
    - The directory name terminated by NUL.
 | 
						|
 | 
						|
    - A number of untracked file/dir names terminated by NUL.
 | 
						|
 | 
						|
The remaining data of each directory block is grouped by type:
 | 
						|
 | 
						|
  - An ewah bitmap, the n-th bit marks whether the n-th directory has
 | 
						|
    valid untracked cache entries.
 | 
						|
 | 
						|
  - An ewah bitmap, the n-th bit records "check-only" bit of
 | 
						|
    read_directory_recursive() for the n-th directory.
 | 
						|
 | 
						|
  - An ewah bitmap, the n-th bit indicates whether hash and stat data
 | 
						|
    is valid for the n-th directory and exists in the next data.
 | 
						|
 | 
						|
  - An array of stat data. The n-th data corresponds with the n-th
 | 
						|
    "one" bit in the previous ewah bitmap.
 | 
						|
 | 
						|
  - An array of hashes. The n-th hash corresponds with the n-th "one" bit
 | 
						|
    in the previous ewah bitmap.
 | 
						|
 | 
						|
  - One NUL.
 | 
						|
 | 
						|
== File System Monitor cache
 | 
						|
 | 
						|
  The file system monitor cache tracks files for which the core.fsmonitor
 | 
						|
  hook has told us about changes.  The signature for this extension is
 | 
						|
  { 'F', 'S', 'M', 'N' }.
 | 
						|
 | 
						|
  The extension starts with
 | 
						|
 | 
						|
  - 32-bit version number: the current supported versions are 1 and 2.
 | 
						|
 | 
						|
  - (Version 1)
 | 
						|
    64-bit time: the extension data reflects all changes through the given
 | 
						|
	time which is stored as the nanoseconds elapsed since midnight,
 | 
						|
	January 1, 1970.
 | 
						|
 | 
						|
  - (Version 2)
 | 
						|
    A null terminated string: an opaque token defined by the file system
 | 
						|
    monitor application.  The extension data reflects all changes relative
 | 
						|
    to that token.
 | 
						|
 | 
						|
  - 32-bit bitmap size: the size of the CE_FSMONITOR_VALID bitmap.
 | 
						|
 | 
						|
  - An ewah bitmap, the n-th bit indicates whether the n-th index entry
 | 
						|
    is not CE_FSMONITOR_VALID.
 | 
						|
 | 
						|
== End of Index Entry
 | 
						|
 | 
						|
  The End of Index Entry (EOIE) is used to locate the end of the variable
 | 
						|
  length index entries and the beginning of the extensions. Code can take
 | 
						|
  advantage of this to quickly locate the index extensions without having
 | 
						|
  to parse through all of the index entries.
 | 
						|
 | 
						|
  Because it must be able to be loaded before the variable length cache
 | 
						|
  entries and other index extensions, this extension must be written last.
 | 
						|
  The signature for this extension is { 'E', 'O', 'I', 'E' }.
 | 
						|
 | 
						|
  The extension consists of:
 | 
						|
 | 
						|
  - 32-bit offset to the end of the index entries
 | 
						|
 | 
						|
  - Hash over the extension types and their sizes (but not
 | 
						|
	their contents).  E.g. if we have "TREE" extension that is N-bytes
 | 
						|
	long, "REUC" extension that is M-bytes long, followed by "EOIE",
 | 
						|
	then the hash would be:
 | 
						|
 | 
						|
	Hash("TREE" + <binary representation of N> +
 | 
						|
		"REUC" + <binary representation of M>)
 | 
						|
 | 
						|
== Index Entry Offset Table
 | 
						|
 | 
						|
  The Index Entry Offset Table (IEOT) is used to help address the CPU
 | 
						|
  cost of loading the index by enabling multi-threading the process of
 | 
						|
  converting cache entries from the on-disk format to the in-memory format.
 | 
						|
  The signature for this extension is { 'I', 'E', 'O', 'T' }.
 | 
						|
 | 
						|
  The extension consists of:
 | 
						|
 | 
						|
  - 32-bit version (currently 1)
 | 
						|
 | 
						|
  - A number of index offset entries each consisting of:
 | 
						|
 | 
						|
    - 32-bit offset from the beginning of the file to the first cache entry
 | 
						|
	in this block of entries.
 | 
						|
 | 
						|
    - 32-bit count of cache entries in this block
 | 
						|
 | 
						|
== Sparse Directory Entries
 | 
						|
 | 
						|
  When using sparse-checkout in cone mode, some entire directories within
 | 
						|
  the index can be summarized by pointing to a tree object instead of the
 | 
						|
  entire expanded list of paths within that tree. An index containing such
 | 
						|
  entries is a "sparse index". Index format versions 4 and less were not
 | 
						|
  implemented with such entries in mind. Thus, for these versions, an
 | 
						|
  index containing sparse directory entries will include this extension
 | 
						|
  with signature { 's', 'd', 'i', 'r' }. Like the split-index extension,
 | 
						|
  tools should avoid interacting with a sparse index unless they understand
 | 
						|
  this extension.
 |