The fix is short (~30 lines), but the description is not. Sorry.
There is a set of problems caused by files in what I'll refer to as the
"present-despite-SKIP_WORKTREE" state. This commit aims to not just fix
these problems, but remove the entire class as a possibility -- for
those using sparse checkouts. But first, we need to understand the
problems this class presents. A quick outline:
* Problems
* User facing issues
* Problem space complexity
* Maintenance and code correctness challenges
* SKIP_WORKTREE expectations in Git
* Suggested solution
* Pros/Cons of suggested solution
* Notes on testcase modifications
=== User facing issues ===
There are various ways for users to get files to be present in the
working copy despite having the SKIP_WORKTREE bit set for that file in
the index. This may come from:
* various git commands not really supporting the SKIP_WORKTREE bit[1,2]
* users grabbing files from elsewhere and writing them to the worktree
(perhaps even cached in their editor)
* users attempting to "abort" a sparse-checkout operation with a
not-so-early Ctrl+C (updating $GIT_DIR/info/sparse-checkout and the
working tree is not atomic)[3].
Once users have present-despite-SKIP_WORKTREE files, any modifications
users make to these files will be ignored, possibly to users' confusion.
Further:
* these files will degrade performance for the sparse-index case due
to requiring the index to be expanded (see commit 55dfcf9591
("sparse-checkout: clear tracked sparse dirs", 2021-09-08) for why
we try to delete entire directories outside the sparse cone).
* these files will not be updated by by standard commands
(switch/checkout/pull/merge/rebase will leave them alone unless
conflicts happen -- and even then, the conflicted file may be
written somewhere else to avoid overwriting the SKIP_WORKTREE file
that is present and in the way)
* there is nothing in Git that users can use to discover such
files (status, diff, grep, etc. all ignore it)
* there is no reasonable mechanism to "recover" from such a condition
(neither `git sparse-checkout reapply` nor `git reset --hard` will
correct it).
So, not only are users modifications ignored, but the files get
progressively more stale over time. At some point in the future, they
may change their sparseness specification or disable sparse-checkouts.
At that time, all present-despite-SKIP_WORKTREE files will show up as
having lots of modifications because they represent a version from a
different branch or commit. These might include user-made local changes
from days before, but the only way to tell is to have users look through
them all closely.
If these users come to others for help, there will be no logs that
explain the issue; it's just a mysterious list of changes. Users might
adamantly claim (correctly, as it turns out) that they didn't modify
these files, while others presume they did.
[1] https://lore.kernel.org/git/xmqqbmb1a7ga.fsf@gitster-ct.c.googlers.com/
[2] https://lore.kernel.org/git/CABPp-BH9tju7WVm=QZDOvaMDdZbpNXrVWQdN-jmfN8wC6YVhmw@mail.gmail.com/
[3] https://lore.kernel.org/git/CABPp-BFnFpzwGC11TLoLs8YK5yiisA5D5-fFjXnJsbESVDwZsA@mail.gmail.com/
=== Problem space complexity ===
SKIP_WORKTREE has been part of Git for over a decade. Duy did lots of
work on it initially, and several others have since come along and put
lots of work into it. Stolee spent most of 2021 on the sparse-index,
with lots of bugfixes along the way including to non-sparse-index cases
as we are still trying to get sparse checkouts to behave reasonably.
Basically every codepath throughout the treat needs to be aware of an
additional type of file: tracked-but-not-present. The extra type
results in lots of extra testcases and lots of extra code everywhere.
But, the sad thing is that we actually have more than one extra type.
We have tracked, tracked-but-not-present (SKIP_WORKTREE), and
tracked-but-promised-to-not-be-present-but-is-present-anyway
(present-despite-SKIP_WORKTREE). Two types is a monumental amount of
effort to support, and adding a third feels a bit like insanity[4].
[4] Some examples of which can be seen at
https://lore.kernel.org/git/CABPp-BGJ_Nvi5TmgriD9Bh6eNXE2EDq2f8e8QKXAeYG3BxZafA@mail.gmail.com/
=== Maintenance and code correctness challenges ===
Matheus' patches to grep stalled for nearly a year, in part because of
complications of how to handle sparse-checkouts appropriately in all
cases[5][6] (with trying to sanely figure out how to sanely handle
present-despite-SKIP_WORKTREE files being one of the complications).
His rm/add follow-ups also took months because of those kinds of
issues[7]. The corner cases with things like submodules and
SKIP_WORKTREE with the addition of present-despite-SKIP_WORKTREE start
becoming really complex[8].
We've had to add ugly logic to merge-ort to attempt to handle
present-despite-SKIP_WORKTREE files[9], and basically just been forced
to give up in merge-recursive knowing full well that we'll sometimes
silently discard user modifications. Despite stash essentially being a
merge, it needed extra code (beyond what was in merge-ort and
merge-recursive) to manually tweak SKIP_WORKTREE bits in order to avoid
a few different bugs that'd result in an early abort with a partial
stash application[10].
[5] See https://lore.kernel.org/git/5f3f7ac77039d41d1692ceae4b0c5df3bb45b74a.1612901326.git.matheus.bernardino@usp.br/#t
and the dates on the thread; also Matheus and I had several
conversations off-list trying to resolve the issues over that time
[6] ...it finally kind of got unstuck after
https://lore.kernel.org/git/CABPp-BGJ_Nvi5TmgriD9Bh6eNXE2EDq2f8e8QKXAeYG3BxZafA@mail.gmail.com/
[7] See for example
https://lore.kernel.org/git/CABPp-BHwNoVnooqDFPAsZxBT9aR5Dwk5D9sDRCvYSb8akxAJgA@mail.gmail.com/#t
and quotes like "The core functionality of sparse-checkout has always
been only partially implemented", a statement I still believe is true
today.
[8] https://lore.kernel.org/git/pull.809.git.git.1592356884310.gitgitgadget@gmail.com/
[9] See commit 66b209b86a ("merge-ort: implement CE_SKIP_WORKTREE
handling with conflicted entries", 2021-03-20)
[10] See commit ba359fd507 ("stash: fix stash application in
sparse-checkouts", 2020-12-01)
=== SKIP_WORKTREE expectations in Git ===
A couple quotes:
* From [11] (before the "sparse-checkout" command existed):
If it needs too many special cases, hacks, and conditionals, then it
is not worth the complexity---if it is easier to write a correct code
by allowing Git to populate working tree files, it is perfectly fine
to do so.
In a sense, the sparse checkout "feature" itself is a hack by itself,
and that is why I think this part should be "best effort" as well.
* From the git-sparse-checkout manual (still present today):
THIS COMMAND IS EXPERIMENTAL. ITS BEHAVIOR, AND THE BEHAVIOR OF OTHER
COMMANDS IN THE PRESENCE OF SPARSE-CHECKOUTS, WILL LIKELY CHANGE IN
THE FUTURE.
[11] https://lore.kernel.org/git/xmqqbmb1a7ga.fsf@gitster-ct.c.googlers.com/
=== Suggested solution ===
SKIP_WORKTREE was written to allow sparse-checkouts, in particular, as
the name of the option implies, to allow the file to NOT be in the
worktree but consider it to be unchanged rather than deleted.
The suggests a simple solution: present-despite-SKIP_WORKTREE files
should not exist, for those using sparse-checkouts.
Enforce this at index loading time by checking if core.sparseCheckout is
true; if so, check files in the index with the SKIP_WORKTREE bit set to
verify that they are absent from the working tree. If they are present,
unset the bit (in memory, though any commands that write to the index
will record the update).
Users can, of course, can get the SKIP_WORKTREE bit back such as by
running `git sparse-checkout reapply` (if they have ensured the file is
unmodified and doesn't match the specified sparsity patterns).
=== Pros/Cons of suggested solution ===
Pros:
* Solves the user visible problems reported above, which I've been
complaining about for nearly a year but couldn't find a solution to.
* Helps prevent slow performance degradation with a sparse-index.
* Much easier behavior in sparse-checkouts for users to reason about
* Very simple, ~30 lines of code.
* Significantly simplifies some ugly testcases, and obviates the need
to test an entire class of potential issues.
* Reduces code complexity, reasoning, and maintenance. Avoids
disagreements about weird corner cases[12].
* It has been reported that some users might be (ab)using
SKIP_WORKTREE as a let-me-modify-but-keep-the-file-in-the-worktree
mechanism[13, and a few other similar references]. These users know
of multiple caveats and shortcomings in doing so; perhaps not
surprising given the "SKIP_WORKTREE expecations" section above.
However, these users use `git update-index --skip-worktree`, and not
`git sparse-checkout` or core.sparseCheckout=true. As such, these
users would be unaffected by this change and can continue abusing
the system as before.
[12] https://lore.kernel.org/git/CABPp-BH9tju7WVm=QZDOvaMDdZbpNXrVWQdN-jmfN8wC6YVhmw@mail.gmail.com/
[13] https://stackoverflow.com/questions/13630849/git-difference-between-assume-unchanged-and-skip-worktree
Cons:
* When core.sparseCheckout is enabled, this adds a performance cost to
reading the index. I'll defer discussion of this cost to a subsequent
patch, since I have some optimizations to add.
=== Notes on testcase modifications ===
The good:
* t1011: Compare to two cases above it ('read-tree will not throw away
dirty changes, non-sparse'); since the file is present, it should
match the non-sparse case now
* t1092: sparse-index & sparse-checkout now match full-worktree
behavior in more cases! Yaay for consistency!
* t6428, t7012: look at how much simpler the tests become! Merge and
stash can just fail early telling the user there's a file in the
way, instead of not noticing until it's about to write a file and
then have to implement sudden crash avoidance. Hurray for sanity!
* t7817: sparse behavior better matches full tree behavior. Hurray
for sanity!
The confusing:
* t3705: These changes were ONLY needed on Windows, but they don't
hurt other platforms. Let's discuss each individually:
* core.sparseCheckout should be false by default. Nothing in this
testcase toggles that until many, many tests later. However,
early tests (#5 in particular) were testing `update-index
--skip-worktree` behavior in a non-sparse-checkout, but the
Windows tests in CI were behaving as if core.sparseCheckout=true
had been specified somewhere. I do not have access to a Windows
machine. But I just manually did what should have been a no-op
and turned the config off. And it fixed the test.
* I have no idea why the leftover .gitattributes file from this
test was causing failures for test #18 on Windows, but only with
these changes of mine. Test #18 was checking for empty stderr,
and specifically wanted to know that some error completely
unrelated to file endings did not appear. The leftover
.gitattributes file thus caused some spurious stderr unrelated to
the thing being checked. Since other tests did not intend to
test normalization, just proactively remove the .gitattributes
file. I'm certain this is cleaner and better, I'm just unsure
why/how this didn't trigger problems before.
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The `ensure_correct_sparsity` function is intended to provide a means of
aligning the in-core index with the sparsity required by the repository
settings and other properties of the index. The function first checks
whether a sparse index is allowed (per repository & sparse checkout pattern
settings). If the sparse index may be used, the index is converted to
sparse; otherwise, it is explicitly expanded with `ensure_full_index`.
Helped-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Victoria Dye <vdye@github.com>
Reviewed-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When converting a full index to sparse, clear and recreate the cache tree
only if the cache tree is not fully valid. The convert_to_sparse operation
should exit silently if a cache tree update cannot be successfully completed
(e.g., due to a conflicted entry state). However, because this failure
scenario only occurs when at least a portion of the cache tree is invalid,
we can save ourselves the cost of clearing and recreating the cache tree by
skipping the check when the cache tree is fully valid.
Helped-by: Derrick Stolee <dstolee@microsoft.com>
Co-authored-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Victoria Dye <vdye@github.com>
Reviewed-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In cone mode, the sparse-index code path learned to remove ignored
files (like build artifacts) outside the sparse cone, allowing the
entire directory outside the sparse cone to be removed, which is
especially useful when the sparse patterns change.
* ds/sparse-index-ignored-files:
sparse-checkout: clear tracked sparse dirs
sparse-index: add SPARSE_INDEX_MEMORY_ONLY flag
attr: be careful about sparse directories
sparse-checkout: create helper methods
sparse-index: use WRITE_TREE_MISSING_OK
sparse-index: silently return when cache tree fails
unpack-trees: fix nested sparse-dir search
sparse-index: silently return when not using cone-mode patterns
t7519: rewrite sparse index test
The convert_to_sparse() method checks for the GIT_TEST_SPARSE_INDEX
environment variable or the "index.sparse" config setting before
converting the index to a sparse one. This is for ease of use since all
current consumers are preparing to compress the index before writing it
to disk. If these settings are not enabled, then convert_to_sparse()
silently returns without doing anything.
We will add a consumer in the next change that wants to use the sparse
index as an in-memory data structure, regardless of whether the on-disk
format should be sparse.
To that end, create the SPARSE_INDEX_MEMORY_ONLY flag that will skip
these config checks when enabled. All current consumers are modified to
pass '0' in the new 'flags' parameter.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Reviewed-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
As we integrate the sparse index into more builtins, we occasionally
need to check the sparse-checkout patterns to see if a path is within
the sparse-checkout cone. Create some helper methods that help
initialize the patterns and check for pattern matching to make this
easier.
The existing callers of commands like get_sparse_checkout_patterns() use
a custom 'struct pattern_list' that is not necessarily the one in the
'struct index_state', so there are not many previous uses that could
adopt these helpers. There are just two in builtin/add.c and
sparse-index.c that can use path_in_sparse_checkout().
We add a path_in_cone_mode_sparse_checkout() as well that will only
return false if the path is outside of the sparse-checkout definition
_and_ the sparse-checkout patterns are in cone mode.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Reviewed-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When updating the cache tree in convert_to_sparse(), the
WRITE_TREE_MISSING_OK flag indicates that trees might be computed that
do not already exist within the object database. This happens in cases
such as 'git add' creating new trees that it wants to store in
anticipation of a following 'git commit'. If this flag is not specified,
then it might trigger a promisor fetch or a failure due to the object
not existing locally.
Use WRITE_TREE_MISSING_OK during convert_to_sparse() to avoid these
possible reasons for the cache_tree_update() to fail.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Reviewed-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
If cache_tree_update() returns a non-zero value, then it could not
create the cache tree. This is likely due to a path having a merge
conflict. Since we are already returning early, let's return silently to
avoid making it seem like we failed to write the index at all.
If we remove our dependence on the cache tree within
convert_to_sparse(), then we could still recover from this scenario and
have a sparse index.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Reviewed-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
While the sparse-index is only enabled when core.sparseCheckoutCone is
also enabled, it is possible for the user to modify the sparse-checkout
file manually in a way that does not match cone-mode patterns. In this
case, we should refuse to convert an index into a sparse index, since
the sparse_checkout_patterns will not be initialized with recursive and
parent path hashsets.
Also silently return if there are no cache entries, which is a simple
case: there are no paths to make sparse!
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Reviewed-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Copy the 'index_state->dir_hash' back to the real istate after expanding
a sparse index.
A crash was observed in 'git status' during some hashmap lookups with
corrupted hashmap entries. During an index expansion, new cache-entries
are added to the 'index_state->name_hash' and the 'dir_hash' in a
temporary 'index_state' variable 'full'. However, only the 'name_hash'
hashmap from this temp variable was copied back into the real 'istate'
variable. The original copy of the 'dir_hash' was incorrectly
preserved. If the table in the 'full->dir_hash' hashmap were realloced,
the stale version (in 'istate') would be corrupted.
The test suite does not operate on index sizes sufficiently large to
trigger this reallocation, so they do not cover this behavior.
Increasing the test suite to cover such scale is fragile and likely
wasteful.
Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When some commands run with command_requires_full_index=1, then the
index can get in a state where the in-memory cache tree is actually
equal to the sparse index's cache tree instead of the full one.
This results in incorrect entry_count values. By clearing the cache
tree before converting to sparse, we avoid this issue.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
If we need to expand a sparse-index into a full one, then the FS Monitor
bitmap is going to be incorrect. Ensure that we start fresh at such an
event.
While this is currently a performance drawback, the eventual hope of the
sparse-index feature is that these expansions will be rare and hence we
will be able to keep the FS Monitor data accurate across multiple Git
commands.
These tests are added to demonstrate that the behavior is the same
across a full index and a sparse index, but also that file modifications
to a tracked directory outside of the sparse cone will trigger
ensure_full_index().
Reviewed-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When creating a full index from a sparse one, we create cache entries
for every blob within a given sparse directory entry. These are
correctly marked with the CE_SKIP_WORKTREE flag, but the CE_EXTENDED
flag is not included. The CE_EXTENDED flag would exist if we loaded a
full index from disk with these entries marked with CE_SKIP_WORKTREE, so
we can add the flag here to be consistent. This allows us to directly
compare the flags present in cache entries when testing the sparse-index
feature, but has no significance to its correctness in the user-facing
functionality.
Reviewed-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The sparse-index format is designed to be compatible with merge
conflicts, even those outside the sparse-checkout definition. The reason
is that when converting a full index to a sparse one, a cache entry with
nonzero stage will not be collapsed into a sparse directory entry.
However, this behavior was not tested, and a different behavior within
convert_to_sparse() fails in this scenario. Specifically,
cache_tree_update() will fail when unmerged entries exist.
convert_to_sparse_rec() uses the cache-tree data to recursively walk the
tree structure, but also to compute the OIDs used in the
sparse-directory entries.
Add an index scan to convert_to_sparse() that will detect if these merge
conflict entries exist and skip the conversion before trying to update
the cache-tree. This is marked as NEEDSWORK because this can be removed
with a suitable update to cache_tree_update() or a similar method that
can construct a cache-tree with invalid nodes, but still allow creating
the nodes necessary for creating sparse directory entries.
It is possible that in the future we will not need to make such an
update, since if we do not expand a sparse-index into a full one, this
conversion does not need to happen. Thus, this can be deferred until the
merge machinery is made to integrate with the sparse-index.
Reviewed-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
While testing the sparse-index, I verified a test with --valgrind and it
complained about an uninitialized value being used in a jump in the
path_matches_pattern_list() method. The line was this one:
if (*dtype == DT_UNKNOWN)
In the call stack, the culprit was the initialization of the dtype
variable in convert_to_sparse_rec().
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Remove the set_index_sparse_config() function by folding it into
set_sparse_index_config(), which was its only user.
Since 122ba1f7b5 (sparse-checkout: toggle sparse index from builtin,
2021-03-30) the flow of this code hasn't made much sense, we'd get
"enabled" in set_sparse_index_config(), proceed to call
set_index_sparse_config() with it.
There we'd call prepare_repo_settings() and set
"repo->settings.sparse_index = 1", only to needlessly call
prepare_repo_settings() again in set_sparse_index_config() (where it
would early abort), and finally setting "repo->settings.sparse_index =
enabled".
Instead we can just call prepare_repo_settings() once, and set the
variable to "enabled" in the first place.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Some users of the index API have a specific path they are looking for,
but choose to use index_file_exists() to rely on the name-hash hashtable
instead of doing binary search with index_name_pos(). These users only
need to know a yes/no answer, not a position within the cache array.
When the index is sparse, the name-hash hash table does not contain the
full list of paths within sparse directories. It _does_ contain the
directory names for the sparse-directory entries.
Create a helper function, expand_to_path(), for intended use with the
name-hash hashtable functions. The integration with name-hash.c will
follow in a later change.
The solution here is to use ensure_full_index() when we determine that
the requested path is within a sparse directory entry. This will
populate the name-hash hashtable as the index is recomputed from
scratch.
There may be cases where the caller is trying to find an untracked path
that is not in the index but also is not within a sparse directory
entry. We want to minimize the overhead for these requests. If we used
index_name_pos() to find the insertion order of the path, then we could
determine from that position if a sparse-directory exists. (In fact,
just calling index_name_pos() in that case would lead to expanding the
index to a full index.) However, this takes O(log N) time where N is the
number of cache entries.
To keep the performance of this call based mostly on the input string,
use index_file_exists() to look for the ancestors of the path. Using the
heuristic that a sparse directory is likely to have a small number of
parent directories, we start from the bottom and build up. Use a string
buffer to allow mutating the path name to terminate after each slash for
each hashset test.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Reviewed-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The cache-tree extension was previously disabled with sparse indexes.
However, the cache-tree is an important performance feature for commands
like 'git status' and 'git add'. Integrate it with sparse directory
entries.
When writing a sparse index, completely clear and recalculate the cache
tree. By starting from scratch, the only integration necessary is to
check if we hit a sparse directory entry and create a leaf of the
cache-tree that has an entry_count of one and no subtrees.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The sparse index extension is used to signal that index writes should be
in sparse mode. This was only updated using GIT_TEST_SPARSE_INDEX=1.
Add a '--[no-]sparse-index' option to 'git sparse-checkout init' that
specifies if the sparse index should be used. It also updates the index
to use the correct format, either way. Add a warning in the
documentation that the use of a repository extension might reduce
compatibility with third-party tools. 'git sparse-checkout init' already
sets extension.worktreeConfig, which places most sparse-checkout users
outside of the scope of most third-party tools.
Update t1092-sparse-checkout-compatibility.sh to use this CLI instead of
GIT_TEST_SPARSE_INDEX=1.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When enabled, this config option signals that index writes should
attempt to use sparse-directory entries.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
A submodule is stored as a "Git link" that actually points to a commit
within a submodule. Submodules are populated or not depending on
submodule configuration, not sparse-checkout. To ensure that the
sparse-index feature integrates correctly with submodules, we should not
collapse a directory if there is a Git link within its range.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
If we have a full index, then we can convert it to a sparse index by
replacing directories outside of the sparse cone with sparse directory
entries. The convert_to_sparse() method does this, when the situation is
appropriate.
For now, we avoid converting the index to a sparse index if:
1. the index is split.
2. the index is already sparse.
3. sparse-checkout is disabled.
4. sparse-checkout does not use cone mode.
Finally, we currently limit the conversion to when the
GIT_TEST_SPARSE_INDEX environment variable is enabled. A mode using Git
config will be added in a later change.
The trickiest thing about this conversion is that we might not be able
to mark a directory as a sparse directory just because it is outside the
sparse cone. There might be unmerged files within that directory, so we
need to look for those. Also, if there is some strange reason why a file
is not marked with CE_SKIP_WORKTREE, then we should give up on
converting that directory. There is still hope that some of its
subdirectories might be able to convert to sparse, so we keep looking
deeper.
The conversion process is assisted by the cache-tree extension. This is
calculated from the full index if it does not already exist. We then
abandon the cache-tree as it no longer applies to the newly-sparse
index. Thus, this cache-tree will be recalculated in every
sparse-full-sparse round-trip until we integrate the cache-tree
extension with the sparse index.
Some Git commands use the index after writing it. For example, 'git add'
will update the index, then write it to disk, then read its entries to
report information. To keep the in-memory index in a full state after
writing, we re-expand it to a full one after the write. This is wasteful
for commands that only write the index and do not read from it again,
but that is only the case until we make those commands "sparse aware."
We can compare the behavior of the sparse-index in
t1092-sparse-checkout-compability.sh by using GIT_TEST_SPARSE_INDEX=1
when operating on the 'sparse-index' repo. We can also compare the two
sparse repos directly, such as comparing their indexes (when expanded to
full in the case of the 'sparse-index' repo). We also verify that the
index is actually populated with sparse directory entries.
The 'checkout and reset (mixed)' test is marked for failure when
comparing a sparse repo to a full repo, but we can compare the two
sparse-checkout cases directly to ensure that we are not changing the
behavior when using a sparse index.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We will mark an in-memory index_state as having sparse directory entries
with the sparse_index bit. These currently cannot exist, but we will add
a mechanism for collapsing a full index to a sparse one in a later
change. That will happen at write time, so we must first allow parsing
the format before writing it.
Commands or methods that require a full index in order to operate can
call ensure_full_index() to expand that index in-memory. This requires
parsing trees using that index's repository.
Sparse directory entries have a specific 'ce_mode' value. The macro
S_ISSPARSEDIR(ce->ce_mode) can check if a cache_entry 'ce' has this type.
This ce_mode is not possible with the existing index formats, so we don't
also verify all properties of a sparse-directory entry, which are:
1. ce->ce_mode == 0040000
2. ce->flags & CE_SKIP_WORKTREE is true
3. ce->name[ce->namelen - 1] == '/' (ends in dir separator)
4. ce->oid references a tree object.
These are all semi-enforced in ensure_full_index() to some extent. Any
deviation will cause a warning at minimum or a failure in the worst
case.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Upcoming changes will introduce modifications to the index format that
allow sparse directories. It will be useful to have a mechanism for
converting those sparse index files into full indexes by walking the
tree at those sparse directories. Name this method ensure_full_index()
as it will guarantee that the index is fully expanded.
This method is not implemented yet, and instead we focus on the
scaffolding to declare it and call it at the appropriate time.
Add a 'command_requires_full_index' member to struct repo_settings. This
will be an indicator that we need the index in full mode to do certain
index operations. This starts as being true for every command, then we
will set it to false as some commands integrate with sparse indexes.
If 'command_requires_full_index' is true, then we will immediately
expand a sparse index to a full one upon reading from disk. This
suffices for now, but we will want to add more callers to
ensure_full_index() later.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>