This just passes the filter-options struct to prepare_bitmap_walk().
Since the bitmap code doesn't actually support any filters yet, it will
fallback to the non-bitmap code if any --filter is specified. But this
lets us exercise that rejection code path, as well as getting us ready
to test filters via rev-list when we _do_ support them.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Currently you can't use object filters with bitmaps, but we plan to
support at least some filters with bitmaps. Let's introduce some
infrastructure that will help us do that:
- prepare_bitmap_walk() now accepts a list_objects_filter_options
parameter (which can be NULL for no filtering; all the current
callers pass this)
- we'll bail early if the filter is incompatible with bitmaps (just as
we would if there were no bitmaps at all). Currently all filters are
incompatible.
- we'll filter the resulting bitmap; since there are no supported
filters yet, this is always a noop.
There should be no behavior change yet, but we'll support some actual
filters in a future patch.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Ever since we added reachability bitmap support, we've been able to use
it with rev-list to get the full list of objects, like:
git rev-list --objects --use-bitmap-index --all
But you can't do so without --objects, since we weren't ready to just
show the commits. However, the internals of the bitmap code are mostly
ready for this: they avoid opening up trees when walking to fill in the
bitmaps. We just need to actually pass in the rev_info to
traverse_bitmap_commit_list() so it knows which types to bother
triggering our callback for.
For completeness, the perf test now covers both the existing --objects
case, as well as the new commits-only behavior (the objects one got way
faster when we introduced bitmaps, but obviously isn't improved now).
Here are numbers for linux.git:
Test HEAD^ HEAD
------------------------------------------------------------------------
5310.7: rev-list (commits) 8.29(8.10+0.19) 1.76(1.72+0.04) -78.8%
5310.8: rev-list (objects) 8.06(7.94+0.12) 8.14(7.94+0.13) +1.0%
That run was cheating a little, as I didn't have any commit-graph in the
repository, and we'd built it by default these days when running git-gc.
Here are numbers with a commit-graph:
Test HEAD^ HEAD
------------------------------------------------------------------------
5310.7: rev-list (commits) 0.70(0.58+0.12) 0.51(0.46+0.04) -27.1%
5310.8: rev-list (objects) 6.20(6.09+0.10) 6.27(6.16+0.11) +1.1%
Still an improvement, but a lot less impressive.
We could have the perf script remove any commit-graph to show the
out-sized effect, but it probably makes sense to leave it in what would
be a more typical setup.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The prior commit taught "--count --objects" to work without bitmaps. We
should be able to get the same answer much more quickly with bitmaps.
Note that we punt on the max_count case here. This perhaps _could_ be
made to work if we find all of the boundary commits and treat them as
UNINTERESTING, subtracting them (and their reachable objects) from the
set we return. That implies an actual commit traversal, but we'd still
be faster due to avoiding opening up any trees. Given the complexity and
the fact that anyone is unlikely to want this, it makes sense to just
fall back to the non-bitmap case for now.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The current behavior from "rev-list --count --objects" is nonsensical:
we enumerate all of the objects except commits, but then give a count of
commits. This wasn't planned, and is just what the code happens to do.
Instead, let's give the answer the user almost certainly wanted: the
full count of objects.
Note that there are more complicated cases around cherry-marking, etc.
We'll punt on those for now, but let the user know that we can't produce
an answer (rather than giving them something useless).
We'll test both the new feature as well as a vanilla --count of commits,
since that surprisingly doesn't seem to be covered in the existing
tests.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
There are a few operations in rev-list that are optimized for bitmaps.
Rather than having the code inline in cmd_rev_list(), let's move them
into helpers. This not only makes the flow of the main function simpler,
but it lets us replace the complex "can we do the optimization?"
conditionals with a series of early returns from the functions. That
also makes it easy to add comments explaining those conditions.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
rev-list has refused to use bitmaps with pathspec limiting since
c8a70d3509 (rev-list: disable --use-bitmap-index when pruning commits,
2015-07-01). But this is true not just for rev-list, but for anyone who
calls prepare_bitmap_walk(); the code isn't equipped to handle this
case. We never noticed because the only other callers would never pass
a pathspec limiter.
But let's push the check down into prepare_bitmap_walk() anyway. That's
a more logical place for it to live, as callers shouldn't need to know
the details (and must be prepared to fall back to a regular traversal
anyway, since there might not be bitmaps in the repository).
It would also prepare us for a day where this case _is_ handled, but
that's pretty unlikely. E.g., we could use bitmaps to generate the set
of commits, and then diff each commit to see if it matches the pathspec.
That would be slightly faster than a naive traversal that actually walks
the commits. But you'd probably do better still to make use of the newer
commit-graph feature to make walking the commits very cheap.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The "--use-bitmap-index" option is usually aspirational: if we have
bitmaps and the request can be fulfilled more quickly using them we'll
do so, but otherwise fall back to a non-bitmap traversal.
The exception is object filtering, which explicitly dies if the two
options are combined. Let's convert this to the usual fallback behavior.
This is a minor convenience for now (since the caller can easily know
that --filter and --use-bitmap-index don't combine), but will become
much more useful as we start to support _some_ filters with bitmaps, but
not others.
The test infrastructure here is bigger than necessary for checking this
one small feature. But it will serve as the basis for more filtering
bitmap tests in future patches.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
"git grep --no-index" should not get affected by the contents of
the .gitmodules file but when "--recurse-submodules" is given or
the "submodule.recurse" variable is set, it did. Now these
settings are ignored in the "--no-index" mode.
* pb/do-not-recurse-grep-no-index:
grep: ignore --recurse-submodules if --no-index is given
"git rebase --fork-point master" used to work OK, as it internally
called "git merge-base --fork-point" that knew how to handle short
refname and dwim it to the full refname before calling the
underlying get_fork_point() function.
This is no longer true after the command was rewritten in C, as its
internall call made directly to get_fork_point() does not dwim a
short ref.
Move the "dwim the refname argument to the full refname" logic that
is used in "git merge-base" to the underlying get_fork_point()
function, so that the other caller of the function in the
implementation of "git rebase" behaves the same way to fix this
regression.
Signed-off-by: Alex Torok <alext9@gmail.com>
[jc: revamped the fix and used Alex's tests]
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When using Windows, a user may run 'git sparse-checkout set A\B\C'
to add the Unix-style path A/B/C to their sparse-checkout patterns.
Normalizing the input path converts the backslashes to slashes before we
add the string 'A/B/C' to the recursive hashset.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When using the sparse-checkout feature, a user may want to incrementally
grow their sparse-checkout pattern set. Allow adding patterns using a
new 'add' subcommand. This is not much different from the 'set'
subcommand, because we still want to allow the '--stdin' option and
interpret inputs as directories when in cone mode and patterns
otherwise.
When in cone mode, we are growing the cone. This may actually reduce the
set of patterns when adding directory A when A/B is already a directory
in the cone. Test the different cases: siblings, parents, ancestors.
When not in cone mode, we can only assume the patterns should be
appended to the sparse-checkout file.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In anticipation of adding "add" and "remove" subcommands to the
sparse-checkout builtin, extract a modify_pattern_list() method from the
sparse_checkout_set() method. This command will read input from the
command-line or stdin to construct a set of patterns, then modify the
existing sparse-checkout patterns after a successful update of the
working directory.
Currently, the only way to modify the patterns is to replace all of the
patterns. This will be extended in a later update.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In anticipation of extending the sparse-checkout builtin with "add"
and "remove" subcommands, extract the code that fills a pattern list
based on the input values. The input changes depending on the
presence of "--stdin" or the value of core.sparseCheckoutCone.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When renaming a remote with
git remote rename X Y
git remote remove X
Git already renames or removes any branch.<name>.remote and
branch.<name>.pushRemote configurations if their value is X.
However remote.pushDefault needs a more gentle approach, as this may be
set in a non-repo configuration file. In such a case only a warning is
printed, such as:
warning: The global configuration remote.pushDefault in:
$HOME/.gitconfig:35
now names the non-existent remote origin
It is changed to remote.pushDefault = Y or removed when set in a repo
configuration though.
Signed-off-by: Bert Wesarg <bert.wesarg@googlemail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When renaming or removing a remote with
git remote rename X Y
git remote remove X
Git already renames/removes any config values from
branch.<name>.remote = X
to
branch.<name>.remote = Y
As branch.<name>.pushRemote also names a remote, it now also renames
or removes these config values from
branch.<name>.pushRemote = X
to
branch.<name>.pushRemote = Y
Signed-off-by: Bert Wesarg <bert.wesarg@googlemail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Some minor clean-ups in function `config_read_branches`:
* remove hardcoded length in `key += 7`
* call `xmemdupz` only once
* use a switch to handle the configuration type and add a `BUG()`
Suggested-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Bert Wesarg <bert.wesarg@googlemail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When 46af44b07d (pull --rebase=<type>: allow single-letter abbreviations
for the type, 2018-08-04) landed in Git, it had the side effect that
not only 'pull --rebase=<type>' accepted the single-letter abbreviations
but also the 'pull.rebase' and 'branch.<name>.rebase' configurations.
However, 'git remote rename' did not honor these single-letter
abbreviations when reading the 'branch.*.rebase' configurations.
We now document the single-letter abbreviations and both code places
share a common function to parse the values of 'git pull --rebase=*',
'pull.rebase', and 'branches.*.rebase'.
The only functional change is the handling of the `branch_info::rebase`
value. Before it was an unsigned enum, thus the truth value could be
checked with `branch_info::rebase != 0`. But `enum rebase_type` is
signed, thus the truth value must now be checked with
`branch_info::rebase >= REBASE_TRUE`
Signed-off-by: Bert Wesarg <bert.wesarg@googlemail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When a user queries config values with --show-origin, often it's
difficult to determine what the actual "scope" (local, global, etc.) of
a given value is based on just the origin file.
Teach 'git config' the '--show-scope' option to print the scope of all
displayed config values. Note that we should never see anything of
"submodule" scope as that is only ever used by submodule-config.c when
parsing the '.gitmodules' file.
Signed-off-by: Matthew Rogers <mattr94@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
There are many situations where the scope of a config command is known
beforehand, such as passing of '--local', '--file', etc. to an
invocation of git config. However, this information is lost when moving
from builtin/config.c to /config.c. This historically hasn't been a big
deal, but to prepare for the upcoming --show-scope option we teach
git_config_source to keep track of the source and the config machinery
to use that information to set current_parsing_scope appropriately.
Signed-off-by: Matthew Rogers <mattr94@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Add a function for inserting a C string into a strbuf. Use it
throughout the source to get rid of magic string length constants and
explicit strlen() calls.
Like strbuf_addstr(), implement it as an inline function to avoid the
implicit strlen() calls to cause runtime overhead.
Helped-by: Taylor Blau <me@ttaylorr.com>
Helped-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
advice.addNothing config variable is used to control the visibility of
two advice messages in the add library. This config variable is
replaced by two new variables, whose names are more clear and relevant
to the two cases.
Also add the two new variables to the documentation.
Signed-off-by: Heba Waly <heba.waly@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
"git checkout X" did not correctly fail when X is not a local
branch but could name more than one remote-tracking branches
(i.e. to be dwimmed as the starting point to create a corresponding
local branch), which has been corrected.
* am/checkout-file-and-ref-ref-ambiguity:
checkout: don't revert file on ambiguous tracking branches
parse_branchname_arg(): extract part as new function
The effort to move "git-add--interactive" to C continues.
* js/patch-mode-in-others-in-c:
commit --interactive: make it work with the built-in `add -i`
built-in add -p: implement the "worktree" patch modes
built-in add -p: implement the "checkout" patch modes
built-in stash: use the built-in `git add -p` if so configured
legacy stash -p: respect the add.interactive.usebuiltin setting
built-in add -p: implement the "stash" and "reset" patch modes
built-in add -p: prepare for patch modes other than "stage"
name_ref() is called for each ref and checks if its a better name for
the referenced commit. If that's the case it remembers it and checks if
a name based on it is better for its ancestors as well. This in done in
the the order for_each_ref() imposes on us.
That might not be optimal. If bad names happen to be encountered first
(as defined by is_better_name()), names derived from them may spread to
a lot of commits, only to be replaced by better names later. Setting
better names first can avoid that.
is_better_name() prefers tags, short distances and old references. The
distance is a measure that we need to calculate for each candidate
commit, but the other two properties are not dependent on the
relationships of commits. Sorting the refs by them should yield better
performance than the essentially random order we currently use.
And applying older references first should also help to reduce rework
due to the fact that older commits have less ancestors than newer ones.
So add all details of names to the tip table first, then sort them
to prefer tags and older references and then apply them in this order.
Here's the performance as measures by hyperfine for the Linux repo
before:
Benchmark #1: ./git -C ../linux/ name-rev --all
Time (mean ± σ): 851.1 ms ± 4.5 ms [User: 806.7 ms, System: 44.4 ms]
Range (min … max): 845.9 ms … 859.5 ms 10 runs
... and with this patch:
Benchmark #1: ./git -C ../linux/ name-rev --all
Time (mean ± σ): 736.2 ms ± 8.7 ms [User: 688.4 ms, System: 47.5 ms]
Range (min … max): 726.0 ms … 755.2 ms 10 runs
Signed-off-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
name_rev() assigns a name to a commit and its parents and grandparents
and so on. Commits share their name string with their first parent,
which in turn does the same, recursively to the root. That saves a lot
of allocations. When a better name is found, the old name is replaced,
but its memory is not released. That leakage can become significant.
Can we release these old strings exactly once even though they are
referenced multiple times? Yes, indeed -- we can make use of the fact
that name_rev() visits the ancestors of a commit after it set a new name
for it and tries to update their names as well.
Members of the first ancestral line have the same taggerdate and
from_tag values, but a higher distance value than their child commit at
generation 0. These are the only criteria used by is_better_name().
Lower distance values are considered better, so a name that is better
for a child will also be better for its parent and grandparent etc.
That means we can free(3) an inferior name at generation 0 and rely on
name_rev() to replace all references in ancestors as well.
If we do that then we need to stop using the string pointer alone to
distinguish new empty rev_name slots from initialized ones, though, as
it technically becomes invalid after the free(3) call -- even though its
value is still different from NULL.
We can check the generation value first, as empty slots will have it
initialized to 0, and for the actual generation 0 we'll set a new valid
name right after the create_or_update_name() call that releases the
string.
For the Chromium repo, releasing superceded names reduces the memory
footprint of name-rev --all significantly. Here's the output of GNU
time before:
0.98user 0.48system 0:01.46elapsed 99%CPU (0avgtext+0avgdata 2601812maxresident)k
0inputs+0outputs (0major+571470minor)pagefaults 0swaps
... and with this patch:
1.01user 0.26system 0:01.28elapsed 100%CPU (0avgtext+0avgdata 1559196maxresident)k
0inputs+0outputs (0major+314370minor)pagefaults 0swaps
It also gets faster; hyperfine before:
Benchmark #1: ./git -C ../chromium/src name-rev --all
Time (mean ± σ): 1.534 s ± 0.006 s [User: 1.039 s, System: 0.494 s]
Range (min … max): 1.522 s … 1.542 s 10 runs
... and with this patch:
Benchmark #1: ./git -C ../chromium/src name-rev --all
Time (mean ± σ): 1.338 s ± 0.006 s [User: 1.047 s, System: 0.291 s]
Range (min … max): 1.327 s … 1.346 s 10 runs
For the Linux repo it doesn't pay off; memory usage only gets down from:
0.76user 0.03system 0:00.80elapsed 99%CPU (0avgtext+0avgdata 292848maxresident)k
0inputs+0outputs (0major+44579minor)pagefaults 0swaps
... to:
0.78user 0.03system 0:00.81elapsed 100%CPU (0avgtext+0avgdata 284696maxresident)k
0inputs+0outputs (0major+44892minor)pagefaults 0swaps
The runtime actually increases slightly from:
Benchmark #1: ./git -C ../linux/ name-rev --all
Time (mean ± σ): 828.8 ms ± 5.0 ms [User: 797.2 ms, System: 31.6 ms]
Range (min … max): 824.1 ms … 838.9 ms 10 runs
... to:
Benchmark #1: ./git -C ../linux/ name-rev --all
Time (mean ± σ): 847.6 ms ± 3.4 ms [User: 807.9 ms, System: 39.6 ms]
Range (min … max): 843.4 ms … 854.3 ms 10 runs
Why is that? In the Chromium repo, ca. 44000 free(3) calls in
create_or_update_name() release almost 1GB, while in the Linux repo
240000+ calls release a bit more than 5MB, so the average discarded
name is ca. 1000x longer in the latter.
Overall I think it's the right tradeoff to make, as it helps curb the
memory usage in repositories with big discarded names, and the added
overhead is small.
Signed-off-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Leave setting the tip_name member of struct rev_name to callers of
create_or_update_name(). This avoids allocations for names that are
rejected by that function. Here's how this affects the runtime when
working with a fresh clone of Git's own repository; performance numbers
by hyperfine before:
Benchmark #1: ./git -C ../git-pristine/ name-rev --all
Time (mean ± σ): 437.8 ms ± 4.0 ms [User: 422.5 ms, System: 15.2 ms]
Range (min … max): 432.8 ms … 446.3 ms 10 runs
... and with this patch:
Benchmark #1: ./git -C ../git-pristine/ name-rev --all
Time (mean ± σ): 408.5 ms ± 1.4 ms [User: 387.2 ms, System: 21.2 ms]
Range (min … max): 407.1 ms … 411.7 ms 10 runs
Signed-off-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We can calculate the size of new name easily and precisely. Open-code
the xstrfmt() calls and grow the buffers as needed before filling them.
This provides a surprisingly large benefit when working with the
Chromium repository; here are the numbers measured using hyperfine
before:
Benchmark #1: ./git -C ../chromium/src name-rev --all
Time (mean ± σ): 5.822 s ± 0.013 s [User: 5.304 s, System: 0.516 s]
Range (min … max): 5.803 s … 5.837 s 10 runs
... and with this patch:
Benchmark #1: ./git -C ../chromium/src name-rev --all
Time (mean ± σ): 1.527 s ± 0.003 s [User: 1.015 s, System: 0.511 s]
Range (min … max): 1.524 s … 1.535 s 10 runs
Signed-off-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Reduce nesting by moving code to come up with a name for the parent into
its own function.
Signed-off-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The commit slab commit_rev_name contains a pointer to a struct rev_name,
and the actual struct is allocated separatly. Avoid that allocation and
pointer indirection by storing the full struct in the commit slab. Use
the tip_name member pointer to determine if the returned struct is
initialized.
Performance in the Linux repository measured with hyperfine before:
Benchmark #1: ./git -C ../linux/ name-rev --all
Time (mean ± σ): 953.5 ms ± 6.3 ms [User: 901.2 ms, System: 52.1 ms]
Range (min … max): 945.2 ms … 968.5 ms 10 runs
... and with this patch:
Benchmark #1: ./git -C ../linux/ name-rev --all
Time (mean ± σ): 851.0 ms ± 3.1 ms [User: 807.4 ms, System: 43.6 ms]
Range (min … max): 846.7 ms … 857.0 ms 10 runs
Signed-off-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Look up the commit slab slot for the commit once using
commit_rev_name_at() and populate it in case it is empty, instead of
checking for emptiness in a separate step using commit_rev_name_peek()
via get_commit_rev_name().
Signed-off-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
name_ref() duplicates the path string and passes it to name_rev(), which
either puts it into a commit slab or ignores it if there is already a
better name, leaking it. Move the duplication to name_rev() and release
the copy in the latter case.
Signed-off-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Keep the const qualifier of the first parameter of get_rev_name() even
when casting the object pointer to a commit pointer, and further for the
parameter of get_commit_rev_name(), as all these uses are read-only.
Signed-off-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The type alias became unused with bf43abc6e6 (name-rev: use sizeof(*ptr)
instead of sizeof(type) in allocation, 2019-11-12); remove it.
Signed-off-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This code was moved straight out of name_rev(). As such, we inherited
the "goto" to jump from an if into an else-if. We also inherited the
fact that "nothing to do -- return NULL" is handled last.
Rewrite the function to first handle the "nothing to do" case. Then we
can handle the conditional allocation early before going on to populate
the struct. No need for goto-ing.
Signed-off-by: Martin Ågren <martin.agren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When we're resolving a REF_DELTA, we compare-and-swap its type from
REF_DELTA to whatever real type the base object has, as discussed in
ab791dd138 (index-pack: fix race condition with duplicate bases,
2014-08-29). If the old type wasn't a REF_DELTA, we consider that a
BUG(). But as discussed in that commit, we might see this case whenever
we try to resolve an object twice, which may happen because we have
multiple copies of the base object.
So this isn't a bug at all, but rather a sign that the input pack is
broken. And indeed, this case is triggered already in t5309.5 and
t5309.6, which create packs with delta cycles and duplicate bases. But
we never noticed because those tests are marked expect_failure.
Those tests were added by b2ef3d9ebb (test index-pack on packs with
recoverable delta cycles, 2013-08-23), which was leaving the door open
for cases that we theoretically _could_ handle. And when we see an
already-resolved object like this, in theory we could keep going after
confirming that the previously resolved child->real_type matches
base->obj->real_type. But:
- enforcing the "only resolve once" rule here saves us from an
infinite loop in other parts of the code. If we keep going, then the
delta cycle in t5309.5 causes us to loop infinitely, as
find_ref_delta_children() doesn't realize which objects have already
been resolved. So there would be more changes needed to make this
case work, and in the meantime we'd be worse off.
- any pack that triggers this is broken anyway. It either has a
duplicate base object, or it has a cycle which causes us to bring in
a duplicate via --fix-thin. In either case, we'd end up rejecting
the pack in write_idx_file(), which also detects duplicates.
So the tests have little value in documenting what we _could_ be doing
(and have been neglected for 6+ years). Let's switch them to confirming
that we handle this case cleanly (and switch out the BUG() for a more
informative die() so that we do so).
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Apply a similar treatment as in the previous patch to pass a 'struct
object_directory *' through the 'load_commit_graph_one_fd_st'
initializer, too.
This prevents a potential bug where a pointer comparison is made to a
NULL 'g->odb', which would cause the commit-graph machinery to think
that a pair of commit-graphs belonged to different alternates when in
fact they do not (i.e., in the case of no '--object-dir').
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
As of the previous patch, all calls to 'commit-graph.c' functions which
perform path normalization (for e.g., 'get_commit_graph_filename()') are
of the form 'ctx->odb->path', which is always in normalized form.
Now that there are no callers passing non-normalized paths to these
functions, ensure that future callers are bound by the same restrictions
by making these functions take a 'struct object_directory *' instead of
a 'const char *'. To match, replace all calls with arguments of the form
'ctx->odb->path' with 'ctx->odb' To recover the path, functions that
perform path manipulation simply use 'odb->path'.
Further, avoid string comparisons with arguments of the form
'odb->path', and instead prefer raw pointer comparisons, which
accomplish the same effect, but are far less brittle.
This has a pleasant side-effect of making these functions much more
robust to paths that cannot be normalized by 'normalize_path_copy()',
i.e., because they are outside of the current working directory.
For example, prior to this patch, Valgrind reports that the following
uninitialized memory read [1]:
$ ( cd t && GIT_DIR=../.git valgrind git rev-parse HEAD^ )
because 'normalize_path_copy()' can't normalize '../.git' (since it's
relative to but above of the current working directory) [2].
By using a 'struct object_directory *' directly,
'get_commit_graph_filename()' does not need to normalize, because all
paths are relative to the current working directory since they are
always read from the '->path' of an object directory.
[1]: https://lore.kernel.org/git/20191027042116.GA5801@sigill.intra.peff.net.
[2]: The bug here is that 'get_commit_graph_filename()' returns the
result of 'normalize_path_copy()' without checking the return
value.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In a previous patch, the 'char *object_dir' in 'struct commit_graph' was
replaced with a 'struct object_directory'. This patch applies the same
treatment to 'struct commit_graph', which is another intermediate step
towards getting rid of all path normalization in 'commit-graph.c'.
Instead of taking a 'char *object_dir', functions that construct a
'struct commit_graph' now take a 'struct object_directory *'. Any code
that needs an object directory path use '->path' instead.
This ensures that all calls to functions that perform path normalization
are given arguments which do not themselves require normalization. This
prepares those functions to drop their normalization entirely, which
will occur in the subsequent patch.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
There are lots of places in 'commit-graph.h' where a function either has
(or almost has) a full 'struct object_directory *', accesses '->path',
and then throws away the rest of the struct.
This can cause headaches when comparing the locations of object
directories across alternates (e.g., in the case of deciding if two
commit-graph layers can be merged). These paths are normalized with
'normalize_path_copy()' which mitigates some comparison issues, but not
all [1].
Replace usage of 'char *object_dir' with 'odb->path' by storing a
'struct object_directory *' in the 'write_commit_graph_context'
structure. This is an intermediate step towards getting rid of all path
normalization in 'commit-graph.c'.
Resolving a user-provided '--object-dir' argument now requires that we
compare it to the known alternates for equality. Prior to this patch,
an unknown '--object-dir' argument would silently exit with status zero.
This can clearly lead to unintended behavior, such as verifying
commit-graphs that aren't in a repository's own object store (or one of
its alternates), or causing a typo to mask a legitimate commit-graph
verification failure. Make this error non-silent by 'die()'-ing when the
given '--object-dir' does not match any known alternate object store.
[1]: In my testing, for example, I can get one side of the commit-graph
code to fill object_dir with "./objects" and the other with just
"objects".
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The sparse-checkout patterns allow special globs according to
fnmatch(3). When writing cone-mode patterns for paths containing
these characters, they must be escaped.
Use is_glob_special() to check which characters must be escaped
this way, and add a path to the tests that contains all glob
characters at once. Note that ']' is not special, since the
initial bracket '[' is escaped.
Reported-by: Jeff King <peff@peff.net>
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When in cone mode, the 'git sparse-checkout list' subcommand lists
the directories included in the sparse cone. When these directories
contain odd characters, such as a backslash, then we need to use
C-style quotes similar to 'git ls-tree'.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
If a user somehow creates a directory with an asterisk (*) or backslash
(\), then the "git sparse-checkout set" command will struggle to provide
the correct pattern in the sparse-checkout file. When not in cone mode,
the provided pattern is written directly into the sparse-checkout file.
However, in cone mode we expect a list of paths to directories and then
we convert those into patterns.
Even more specifically, the goal is to always allow the following from
the root of a repo:
git ls-tree --name-only -d HEAD | git sparse-checkout set --stdin
The ls-tree command provides directory names with an unescaped asterisk.
It also quotes the directories that contain an escaped backslash. We
must remove these quotes, then keep the escaped backslashes.
Use unquote_c_style() when parsing lines from stdin. Command-line
arguments will be parsed as-is, assuming the user can do the correct
level of escaping from their environment to match the exact directory
names.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
If a user somehow creates a directory with an asterisk (*) or backslash
(\), then the "git sparse-checkout set" command will struggle to provide
the correct pattern in the sparse-checkout file. When not in cone mode,
the provided pattern is written directly into the sparse-checkout file.
However, in cone mode we expect a list of paths to directories and then
we convert those into patterns.
However, there is some care needed for the timing of these escapes. The
in-memory pattern list is used to update the working directory before
writing the patterns to disk. Thus, we need the command to have the
unescaped names in the hashsets for the cone comparisons, then escape
the patterns later.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We often skip an optional prefix in a string with a hardcoded
constant, e.g.
if (starts_with(string, "prefix"))
string += 6;
which is less error prone when written
skip_prefix(string, "prefix", &string);
Note that this changes a few error messages from "git reflog expire
--expire=nonsense.timestamp", which used to complain by saying
'--expire=nonsense.timestamp' is not a valid timestamp
but with this change, we say
'nonsense.timestamp' is not a valid timestamp
which is more technically correct (the string with --expire= as
a prefix obviously cannot be a valid timestamp, but the error is
about the part of the input without that prefix).
Helped-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Some callers of check_object_signature() can work on arbitrary
repositories, but the repo does not get passed to this function.
Instead, the_repository is always used internally. To fix possible
inconsistencies, allow the function to receive a struct repository and
make those callers pass on the repo being handled.
Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Allow hash_object_file() to work on arbitrary repos by introducing a
git_hash_algo parameter. Change callers which have a struct repository
pointer in their scope to pass on the git_hash_algo from the said repo.
For all other callers, pass on the_hash_algo, which was already being
used internally at hash_object_file(). This functionality will be used
in the following patch to make check_object_signature() be able to work
on arbitrary repos (which, in turn, will be used to fix an
inconsistency at object.c:parse_object()).
Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Some callers of open_istream() at archive-tar.c and archive-zip.c are
capable of working on arbitrary repositories but the repo struct is not
passed down to open_istream(), which uses the_repository internally. For
now, that's not a problem since the said callers are only being called
with the_repository. But to be consistent and avoid future problems,
let's allow open_istream() to receive a struct repository and use that
instead of the_repository. This parameter addition will also be used in
a future patch to make sha1-file.c:check_object_signature() be able to
work on arbitrary repos.
Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br>
Signed-off-by: Junio C Hamano <gitster@pobox.com>