Both 'git for-each-ref --merged=<X>' and 'git branch --merged=<X>' use
the ref-filter machinery to select references or branches (respectively)
that are reachable from a set of commits presented by one or more
--merged arguments. This happens within reach_filter(), which uses the
revision-walk machinery to walk history in a standard way.
However, the commit-reach.c file is full of custom searches that are
more efficient, especially for reachability queries that can terminate
early when reachability is discovered. Add a new
tips_reachable_from_bases() method to commit-reach.c and call it from
within reach_filter() in ref-filter.c. This affects both 'git branch'
and 'git for-each-ref' as tested in p1500-graph-walks.sh.
For the Linux kernel repository, we take an already-fast algorithm and
make it even faster:
Test HEAD~1 HEAD
-------------------------------------------------------------------
1500.5: contains: git for-each-ref --merged 0.13 0.02 -84.6%
1500.6: contains: git branch --merged 0.14 0.02 -85.7%
1500.7: contains: git tag --merged 0.15 0.03 -80.0%
(Note that we remove the iterative 'git rev-list' test from p1500
because it no longer makes sense as a comparison to 'git for-each-ref'
and would just waste time running it for these comparisons.)
The algorithm is implemented in commit-reach.c in the method
tips_reachable_from_base(). This method takes a string_list of tips and
assigns the 'util' for each item with the value 1 if the base commit can
reach those tips.
Like other reachability queries in commit-reach.c, the fastest way to
search for "can A reach B?" is to do a depth-first search up to the
generation number of B, preferring to explore first parents before later
parents. While we must walk all reachable commits up to that generation
number when the answer is "no", the depth-first search can answer "yes"
much faster than other approaches in most cases.
This search becomes trickier when there are multiple targets for the
depth-first search. The commits with lower generation number are more
likely to be within the history of the start commit, but we don't want
to waste time searching commits of low generation number if the commit
target with lowest generation number has already been found.
The trick here is to take the input commits and sort them by generation
number in ascending order. Track the index within this order as
min_generation_index. When we find a commit, if its index in the list is
equal to min_generation_index, then we can increase the generation
number boundary of our search to the next-lowest value in the list.
With this mechanism, the number of commits to search is minimized with
respect to the depth-first search heuristic. We will walk all commits up
to the minimum generation number of a commit that is _not_ reachable
from the start, but we will walk only the necessary portion of the
depth-first search for the reachable commits of lower generation.
Add extra tests for this behavior in t6600-test-reach.sh as the
interesting data shape of that repository can sometimes demonstrate
corner case bugs.
Signed-off-by: Derrick Stolee <derrickstolee@github.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The previous change implemented the ahead_behind() method, including an
algorithm to compute the ahead/behind values for a number of commit tips
relative to a number of commit bases. Now, integrate that algorithm as
part of 'git for-each-ref' hidden behind a new format atom,
ahead-behind. This naturally extends to 'git branch' and 'git tag'
builtins, as well.
This format allows specifying multiple bases, if so desired, and all
matching references are compared against all of those bases. For this
reason, failing to read a reference provided from these atoms results in
an error.
In order to translate the ahead_behind() method information to the
format output code in ref-filter.c, we must populate arrays of
ahead_behind_count structs. In struct ref_array, we store the full array
that will be passed to ahead_behind(). In struct ref_array_item, we
store an array of pointers that point to the relvant items within the
full array. In this way, we can pull all relevant ahead/behind values
directly when formatting output for a specific item. It also ensures the
lifetime of the ahead_behind_count structs matches the time that the
array is being used.
Add specific tests of the ahead/behind counts in t6600-test-reach.sh, as
it has an interesting repository shape. In particular, its merging
strategy and its use of different commit-graphs would demonstrate over-
counting if the ahead_behind() method did not already account for that
possibility.
Also add tests for the specific for-each-ref, branch, and tag builtins.
In the case of 'git tag', there are intersting cases that happen when
some of the selected tips are not commits. This requires careful logic
around commits_nr in the second loop of filter_ahead_behind(). Also, the
test in t7004 is carefully located to avoid being dependent on the GPG
prereq. It also avoids using the test_commit helper, as that will add
ticks to the time and disrupt the expected timestamps in later tag
tests.
Also add performance tests in a new p1300-graph-walks.sh script. This
will be useful for more uses in the future, but for now compare the
ahead-behind counting algorithm in 'git for-each-ref' to the naive
implementation by running 'git rev-list --count' processes for each
input.
For the Git source code repository, the improvement is already obvious:
Test this tree
---------------------------------------------------------------
1500.2: ahead-behind counts: git for-each-ref 0.07(0.07+0.00)
1500.3: ahead-behind counts: git branch 0.07(0.06+0.00)
1500.4: ahead-behind counts: git tag 0.07(0.06+0.00)
1500.5: ahead-behind counts: git rev-list 1.32(1.04+0.27)
But the standard performance benchmark is the Linux kernel repository,
which demosntrates a significant improvement:
Test this tree
---------------------------------------------------------------
1500.2: ahead-behind counts: git for-each-ref 0.27(0.24+0.02)
1500.3: ahead-behind counts: git branch 0.27(0.24+0.03)
1500.4: ahead-behind counts: git tag 0.28(0.27+0.01)
1500.5: ahead-behind counts: git rev-list 4.57(4.03+0.54)
The 'git rev-list' test exists in this change as a demonstration, but it
will be removed in the next change to avoid wasting time on this
comparison.
Signed-off-by: Derrick Stolee <derrickstolee@github.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
It's more readable to use starts_with() instead of strncmp() to match a
prefix, as the latter requires a manually-computed length, and has the
funny "matching is zero" return value common to cmp functions. This
patch converts several cases which were found with:
git grep 'strncmp(.*, [0-9]*)'
But note that it doesn't convert all such cases. There are several where
the magic length number is repeated elsewhere in the code, like:
/* handle "buf" which isn't NUL-terminated and might be too small */
if (len >= 3 && !strncmp(buf, "foo", 3))
or:
/* exact match for "foo", but within a larger string */
if (end - buf == 3 && !strncmp(buf, "foo", 3))
While it would not produce the wrong outcome to use starts_with() in
these cases, we'd still be left with one instance of "3". We're better
to leave them for now, as the repeated "3" makes it clear that the two
are linked (there may be other refactorings that handle both, but
they're out of scope for this patch).
A few things to note while reading the patch:
- all cases but one are trying to match, and so lose the extra "!".
The case in the first hunk of urlmatch.c is not-matching, and hence
gains a "!".
- the case in remote-fd.c is matching the beginning of "connect foo",
but we never look at str+8 to parse the "foo" part (which would make
this a candidate for skip_prefix(), not starts_with()). This seems
at first glance like a bug, but is a limitation of how remote-fd
works.
- the second hunk in urlmatch.c shows some cases adjacent to other
strncmp() calls that are left. These are of the "exact match within
a larger string" type, as described above.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Clean-ups in error messages produced by "git for-each-ref" and friends.
* jk/ref-filter-error-reporting-fix:
ref-filter: convert email atom parser to use err_bad_arg()
ref-filter: truncate atom names in error messages
ref-filter: factor out "unrecognized %(foo) arg" errors
ref-filter: factor out "%(foo) does not take arguments" errors
ref-filter: reject arguments to %(HEAD)
Code clean-up around unused function parameters.
* jk/unused-post-2.39:
userdiff: mark unused parameter in internal callback
list-objects-filter: mark unused parameters in virtual functions
diff: mark unused parameters in callbacks
xdiff: mark unused parameter in xdl_call_hunk_func()
xdiff: drop unused parameter in def_ff()
ws: drop unused parameter from ws_blank_line()
list-objects: drop process_gitlink() function
blob: drop unused parts of parse_blob_buffer()
ls-refs: use repository parameter to iterate refs
The error message for a bogus argument to %(authoremail), etc, is:
$ git for-each-ref --format='%(authoremail:foo)'
fatal: unrecognized email option: foo
Saying just "email" is a little vague; most of the other atom parsers
would use the full name "%(authoremail)", but we can't do that here
because the same function also handles %(taggeremail), etc. Until
recently, passing atom->name was a bad idea, because it erroneously
included the arguments in the atom name. But since the previous commit
taught err_bad_arg() to handle this, we can now do so and get:
fatal: unrecognized %(authoremail) argument: foo
which is consistent with other atoms.
Signed-off-by: Jeff King <peff@peff.net>
Acked-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
If you pass a bogus argument to %(refname), you may end up with a
message like this:
$ git for-each-ref --format='%(refname:foo)'
fatal: unrecognized %(refname:foo) argument: foo
which is confusing. It should just say:
fatal: unrecognized %(refname) argument: foo
which is clearer, and is consistent with most other atom parsers. Those
other parsers do not have the same problem because they pass the atom
name from a string literal in the parser function. But because the
parser for %(refname) also handles %(upstream) and %(push), it instead
uses atom->name, which includes the arguments. The oid atom parser which
handles %(tree), %(parent), etc suffers from the same problem.
It seems like the cleanest fix would be for atom->name to be _just_ the
name, since there's already a separate "args" field. But since that
field is also used for other things, we can't change it easily (e.g.,
it's how we find things in the used_atoms array, and clearly %(refname)
and %(refname:short) are not the same thing).
Instead, we'll teach our error_bad_arg() function to stop at the first
":". This is a little hacky, as we're effectively re-parsing the name,
but the format is simple enough to do this as a one-liner, and this
localizes the change to the error-reporting code.
We'll give the same treatment to err_no_arg(). None of its callers use
this atom->name trick, but it's worth future-proofing it while we're
here.
Signed-off-by: Jeff King <peff@peff.net>
Acked-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Atom parsers that take arguments generally have a catch-all for "this
arg is not recognized". Most of them use the same printf template, which
is good, because it makes life easier for translators. Let's pull this
template into a helper function, which makes the code in the parsers
shorter and avoids any possibility of differences.
As with the previous commit, we'll pick an arbitrary atom to make sure
the test suite covers this code.
Signed-off-by: Jeff King <peff@peff.net>
Acked-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Many atom parsers give the same error message, differing only in the
name of the atom. If we use "%s does not take arguments", that should
make life easier for translators, as they only need to translate one
string. And in doing so, we can easily pull it into a helper function to
make sure they are all using the exact same string.
I've added a basic test here for %(HEAD), just to make sure this code is
exercised at all in the test suite. We could cover each such atom, but
the effort-to-reward ratio of trying to maintain an exhaustive list
doesn't seem worth it.
Signed-off-by: Jeff King <peff@peff.net>
Acked-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The %(HEAD) atom doesn't take any arguments, but unlike other atoms in
the same boat (objecttype, deltabase, etc), it does not detect this
situation and complain. Let's make it consistent with the others.
Signed-off-by: Jeff King <peff@peff.net>
Acked-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Various leak fixes.
* ab/various-leak-fixes:
built-ins: use free() not UNLEAK() if trivial, rm dead code
revert: fix parse_options_concat() leak
cherry-pick: free "struct replay_opts" members
rebase: don't leak on "--abort"
connected.c: free the "struct packed_git"
sequencer.c: fix "opts->strategy" leak in read_strategy_opts()
ls-files: fix a --with-tree memory leak
revision API: call graph_clear() in release_revisions()
unpack-file: fix ancient leak in create_temp_file()
built-ins & libs & helpers: add/move destructors, fix leaks
dir.c: free "ident" and "exclude_per_dir" in "struct untracked_cache"
read-cache.c: clear and free "sparse_checkout_patterns"
commit: discard partial cache before (re-)reading it
{reset,merge}: call discard_index() before returning
tests: mark tests as passing with SANITIZE=leak
The ls_refs() function (for the v2 protocol command of the same name)
takes a repository parameter (like all v2 commands), but ignores it. It
should use it to access the refs.
This isn't a bug in practice, since we only call this function when
serving upload-pack from the main repository. But it's an awkward
gotcha, and it causes -Wunused-parameter to complain.
The main reason we don't use the repository parameter is that the ref
iteration interface we call doesn't have a "refs_" variant that takes a
ref_store. However we can easily add one. In fact, since there is only
one other caller (in ref-filter.c), there is no need to maintain the
non-repository wrapper; that caller can just use the_repository. It's
still a long way from consistently using a repository object, but it's
one small step in the right direction.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Fix various leaks in built-ins, libraries and a test helper here we
were missing a call to strbuf_release(), string_list_clear() etc, or
were calling them after a potential "return".
Comments on individual changes:
- builtin/checkout.c: Fix a memory leak that was introduced in [1]. A
sibling leak introduced in [2] was recently fixed in [3]. As with [3]
we should be using the wt_status_state_free_buffers() API introduced
in [4].
- builtin/repack.c: Fix a leak that's been here since this use of
"strbuf_release()" was added in a1bbc6c017 (repack: rewrite the shell
script in C, 2013-09-15). We don't use the variable for anything
except this loop, so we can instead free it right afterwards.
- builtin/rev-parse: Fix a leak that's been here since this code was
added in 21d4783538 (Add a parseopt mode to git-rev-parse to bring
parse-options to shell scripts., 2007-11-04).
- builtin/stash.c: Fix a couple of leaks that have been here since
this code was added in d4788af875 (stash: convert create to builtin,
2019-02-25), we strbuf_release()'d only some of the "struct strbuf" we
allocated earlier in the function, let's release all of them.
- ref-filter.c: Fix a leak in 482c119186 (gpg-interface: improve
interface for parsing tags, 2021-02-11), we don't use the "payload"
variable that we ask parse_signature() to populate for us, so let's
free it.
- t/helper/test-fake-ssh.c: Fix a leak that's been here since this
code was added in 3064d5a38c (mingw: fix t5601-clone.sh,
2016-01-27). Let's free the "struct strbuf" as soon as we don't need
it anymore.
1. c45f0f525d (switch: reject if some operation is in progress,
2019-03-29)
2. 2708ce62d2 (branch: sort detached HEAD based on a flag,
2021-01-07)
3. abcac2e19f (ref-filter.c: fix a leak in get_head_description,
2022-09-25)
4. 962dd7ebc3 (wt-status: introduce wt_status_state_free_buffers(),
2020-09-27).
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
This commit fixes a bug when parsing tags that have CRLF line endings, a
signature, and no body, like this (the "^M" are marking the CRs):
this is the subject^M
-----BEGIN PGP SIGNATURE-----^M
^M
...some stuff...^M
-----END PGP SIGNATURE-----^M
When trying to find the start of the body, we look for a blank line
separating the subject and body. In this case, there isn't one. But we
search for it using strstr(), which will find the blank line in the
signature.
In the non-CRLF code path, we check whether the line we found is past
the start of the signature, and if so, put the body pointer at the start
of the signature (effectively making the body empty). But the CRLF code
path doesn't catch the same case, and we end up with the body pointer in
the middle of the signature field. This has two visible problems:
- printing %(contents:subject) will show part of the signature, too,
since the subject length is computed as (body - subject)
- the length of the body is (sig - body), which makes it negative.
Asking for %(contents:body) causes us to cast this to a very large
size_t when we feed it to xmemdupz(), which then complains about
trying to allocate too much memory.
These are essentially the same bugs fixed in the previous commit, except
that they happen when there is a CRLF blank line in the signature,
rather than no blank line at all. Both are caused by the refactoring in
9f75ce3d8f (ref-filter: handle CRLF at end-of-line more gracefully,
2020-10-29).
We can fix this by doing the same "sigstart" check that we do in the
non-CRLF case. And rather than repeat ourselves, we can just use
short-circuiting OR to collapse both cases into a single conditional.
I.e., rather than:
if (strstr("\n\n"))
...found blank, check if it's in signature...
else if (strstr("\r\n\r\n"))
...found blank, check if it's in signature...
else
...no blank line found...
we can collapse this to:
if (strstr("\n\n")) ||
strstr("\r\n\r\n")))
...found blank, check if it's in signature...
else
...no blank line found...
The tests show the problem and the fix. Though it wasn't broken, I
included contents:signature here to make sure it still behaves as
expected, but note the shell hackery needed to make it work. A
less-clever option would be to skip using test_atom and just "append_cr
>expected" ourselves.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
When ref-filter is asked to show %(content:subject), etc, we end up in
find_subpos() to parse out the three major parts: the subject, the body,
and the signature (if any).
When searching for the blank line between the subject and body, if we
don't find anything, we try to treat the whole message as the subject,
with no body. But our idea of "the whole message" needs to take into
account the signature, too. Since 9f75ce3d8f (ref-filter: handle CRLF at
end-of-line more gracefully, 2020-10-29), the code instead goes all the
way to the end of the buffer, which produces confusing output.
Here's an example. If we have a tag message like this:
this is the subject
-----BEGIN SSH SIGNATURE-----
...some stuff...
-----END SSH SIGNATURE-----
then the current parser will put the start of the body at the end of the
whole buffer. This produces two buggy outcomes:
- since the subject length is computed as (body - subject), showing
%(contents:subject) will print both the subject and the signature,
rather than just the single line
- since the body length is computed as (sig - body), and the body now
starts _after_ the signature, we end up with a negative length!
Fortunately we never access out-of-bounds memory, because the
negative length is fed to xmemdupz(), which casts it to a size_t,
and xmalloc() bails trying to allocate an absurdly large value.
In theory it would be possible for somebody making a malicious tag
to wrap it around to a more reasonable value, but it would require a
tag on the order of 2^63 bytes. And even if they did, all they get
is an out of bounds string read. So the security implications are
probably not interesting.
We can fix both by correctly putting the start of the body at the same
index as the start of the signature (effectively making the body empty).
Note that this is a real issue with signatures generated with gpg.format
set to "ssh", which would look like the example above. In the new tests
here I use a hard-coded tag message, for a few reasons:
- regardless of what the ssh-signing code produces now or in the
future, we should be testing this particular case
- skipping the actual signature makes the tests simpler to write (and
allows them to run on more systems)
- t6300 has helpers for working with gpg signatures; for the purposes
of this bug, "BEGIN PGP" is just as good a demonstration, and this
simplifies the tests
Curiously, the same issue doesn't happen with real gpg signatures (and
there are even existing tests in t6300 with cover this). Those have a
blank line between the header and the content, like:
this is the subject
-----BEGIN PGP SIGNATURE-----
...some stuff...
-----END PGP SIGNATURE-----
Because we search for the subject/body separator line with a strstr(),
we find the blank line in the signature, even though it's outside of
what we'd consider the body. But that puts us unto a separate code path,
which realizes that we're now in the signature and adjusts the line back
to "sigstart". So this patch is basically just making the "no line found
at all" case match that. And note that "sigstart" is always defined (if
there is no signature, it points to the end of the buffer as you'd
expect).
Reported-by: Martin Englund <martin@englund.nu>
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
In 2708ce62d2 (branch: sort detached HEAD based on a flag, 2021-01-07) a
call to wt_status_state_free_buffers, responsible of freeing the
resources that could be allocated in the local struct wt_status_state
state, was eliminated.
The call to wt_status_state_free_buffers was introduced in 962dd7ebc3
(wt-status: introduce wt_status_state_free_buffers(), 2020-09-27). This
commit brings back that call in get_head_description.
Signed-off-by: Rubén Justo <rjusto@gmail.com>
Reviewed-by: Martin Ågren <martin.agren@gmail.com>
Acked-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
As reported in [1] the "UNUSED(var)" macro introduced in
2174b8c75de (Merge branch 'jk/unused-annotation' into next,
2022-08-24) breaks coccinelle's parsing of our sources in files where
it occurs.
Let's instead partially go with the approach suggested in [2] of
making this not take an argument. As noted in [1] "coccinelle" will
ignore such tokens in argument lists that it doesn't know about, and
it's less of a surprise to syntax highlighters.
This undoes the "help us notice when a parameter marked as unused is
actually use" part of 9b24034754 (git-compat-util: add UNUSED macro,
2022-08-19), a subsequent commit will further tweak the macro to
implement a replacement for that functionality.
1. https://lore.kernel.org/git/220825.86ilmg4mil.gmgdl@evledraar.gmail.com/
2. https://lore.kernel.org/git/220819.868rnk54ju.gmgdl@evledraar.gmail.com/
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Hashmap comparison functions must conform to a particular callback
interface, but many don't use all of their parameters. Especially the
void cmp_data pointer, but some do not use keydata either (because they
can easily form a full struct to pass when doing lookups). Let's mark
these to make -Wunused-parameter happy.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Various ref-filter options like "--contains" or "--merged" may cause us
to traverse large segments of the history graph. It's counter-productive
to have save_commit_buffer turned on, as that will instruct the commit
code to cache in-memory the object contents for each commit we traverse.
This increases the amount of heap memory used while providing little or
no benefit, since we're not actually planning to display those commits
(which is the usual reason that tools like git-log want to keep them
around). We can easily disable this feature while ref-filter is running.
This lowers peak heap (as measured by massif) for running:
git tag --contains 1da177e4c3
in linux.git from ~100MB to ~20MB. It also seems to improve runtime by
4-5% (600ms vs 630ms).
A few points to note:
- it should be safe to temporarily disable save_commit_buffer like
this. The saved buffers are accessed through get_commit_buffer(),
which treats the saved ones like a cache, and loads on-demand from
the object database on a cache miss. So any code that was using this
would not be wrong, it might just incur an extra object lookup for
some objects. But...
- I don't think any ref-filter related code is using the cache. While
it's true that an option like "--format=%(*contents:subject)" or
"--sort=*authordate" will need to look at the commit contents,
ref-filter doesn't use get_commit_buffer() to do so! It always reads
the objects directly via read_object_file(), though it does avoid
re-reading objects if the format can be satisfied without them.
Timing "git tag --format=%(*authordate)" shows that we're the same
before and after, as expected.
- Note that all of this assumes you don't have a commit-graph file. if
you do, then the heap usage is even lower, and the runtime is 10x
faster. So in that sense this is not urgent, as there's a much
better solution. But since it's such an obvious and easy win for
fallback cases (including commits which aren't yet in the graph
file), there's no reason not to.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Plug the memory leaks from the trickiest API of all, the revision
walker.
* ab/plug-leak-in-revisions: (27 commits)
revisions API: add a TODO for diff_free(&revs->diffopt)
revisions API: have release_revisions() release "topo_walk_info"
revisions API: have release_revisions() release "date_mode"
revisions API: call diff_free(&revs->pruning) in revisions_release()
revisions API: release "reflog_info" in release revisions()
revisions API: clear "boundary_commits" in release_revisions()
revisions API: have release_revisions() release "prune_data"
revisions API: have release_revisions() release "grep_filter"
revisions API: have release_revisions() release "filter"
revisions API: have release_revisions() release "cmdline"
revisions API: have release_revisions() release "mailmap"
revisions API: have release_revisions() release "commits"
revisions API users: use release_revisions() for "prune_data" users
revisions API users: use release_revisions() with UNLEAK()
revisions API users: use release_revisions() in builtin/log.c
revisions API users: use release_revisions() in http-push.c
revisions API users: add "goto cleanup" for release_revisions()
stash: always have the owner of "stash_info" free it
revisions API users: use release_revisions() needing REV_INFO_INIT
revision.[ch]: document and move code declared around "init"
...
Introduce and apply coccinelle rule to discourage an explicit
comparison between a pointer and NULL, and applies the clean-up to
the maintenance track.
* ep/maint-equals-null-cocci:
tree-wide: apply equals-null.cocci
tree-wide: apply equals-null.cocci
contrib/coccinnelle: add equals-null.cocci
Add a release_revisions() to various users of "struct rev_list" in
those straightforward cases where we only need to add the
release_revisions() call to the end of a block, and don't need to
e.g. refactor anything to use a "goto cleanup" pattern.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Fix a memory leak in the parse_date_format() function by providing a
new date_mode_release() companion function.
By using this in "t/helper/test-date.c" we can mark the
"t0006-date.sh" test as passing when git is compiled with
SANITIZE=leak, and whitelist it to run under
"GIT_TEST_PASSING_SANITIZE_LEAK=true" by adding
"TEST_PASSES_SANITIZE_LEAK=true" to the test itself.
The other tests that expose this memory leak (i.e. take the
"mode->type == DATE_STRFTIME" branch in parse_date_format()) are
"t6300-for-each-ref.sh" and "t7004-tag.sh". The former is due to an
easily fixed leak in "ref-filter.c", and brings the failures in
"t6300-for-each-ref.sh" down from 51 to 48.
Fixing the remaining leaks will have to wait until there's a
release_revisions() in "revision.c", as they have to do with leaks via
"struct rev_info".
There is also a leak in "builtin/blame.c" due to its call to
parse_date_format() to parse the "blame.date" configuration. However
as it declares a file-level "static struct date_mode blame_date_mode"
to track the data, LSAN will not report it as a leak. It's possible to
get valgrind(1) to complain about it with e.g.:
valgrind --leak-check=full --show-leak-kinds=all ./git -P -c blame.date=format:%Y blame README.md
But let's focus on things LSAN complains about, and are thus
observable with "TEST_PASSES_SANITIZE_LEAK=true". We should get to
fixing memory leaks in "builtin/blame.c", but as doing so would
require some re-arrangement of cmd_blame() let's leave it for some
other time.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Provide and use a DATE_MODE_INIT macro. Most of the users of struct
date_mode" use it via pretty.h's "struct pretty_print_context" which
doesn't have an initialization macro, so we're still bound to being
initialized to "{ 0 }" by default.
But we can change the couple of callers that directly declared a
variable on the stack to instead use the initializer, and thus do away
with the "mode.local = 0" added in add00ba2de (date: make "local"
orthogonal to date format, 2015-09-03).
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Things like "git -c branch.sort=bogus branch new HEAD", i.e. the
operation modes of the "git branch" command that do not need the
sort key information, no longer errors out by seeing a bogus sort
key.
* jc/fix-ref-sorting-parse:
for-each-ref: delay parsing of --sort=<atom> options
Emir and Jean-Noël reported typos in some i18n messages when preparing
l10n for git 2.34.0.
* Fix unstable spelling of config variable "gpg.ssh.defaultKeyCommand"
which was introduced in commit fd9e226776 (ssh signing: retrieve a
default key from ssh-agent, 2021-09-10).
* Add missing space between "with" and "--python" which was introduced
in commit bd0708c7eb (ref-filter: add %(raw) atom, 2021-07-26).
* Fix unmatched single quote in 'builtin/index-pack.c' which was
introduced in commit 8737dab346 (index-pack: refactor renaming in
final(), 2021-09-09)
[1] https://github.com/git-l10n/git-po/pull/567
Reported-by: Emir Sarı <bitigchi@me.com>
Reported-by: Jean-Noël Avila <jn.avila@free.fr>
Signed-off-by: Jiang Xin <worldhello.net@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The for-each-ref family of commands invoke parsers immediately when
it sees each --sort=<atom> option, and die before even seeing the
other options on the command line when the <atom> is unrecognised.
Instead, accumulate them in a string list, and have them parsed into
a ref_sorting structure after the command line parsing is done. As
a consequence, "git branch --sort=bogus -h" used to fail to give the
brief help, which arguably may have been a feature, now does so,
which is more consistent with how other options work.
The patch is smaller than the actual extent of the "damage" to the
codebase, thanks to the fact that the original code consistently
used OPT_REF_SORT() macro to handle command line options. We only
needed to replace the variable used for the list, and implementation
of the callback function used in the macro.
The old rule was for the users of the API to:
- Declare ref_sorting and ref_sorting_tail variables;
- OPT_REF_SORT() macro will instantiate ref_sorting instance (which
may barf and die) and append it to the tail;
- Append to the tail each ref_sorting read from the configuration
by parsing in the config callback (which may barf and die);
- See if ref_sorting is null and use ref_sorting_default() instead.
Now the rule is not all that different but is simpler:
- Declare ref_sorting_options string list.
- OPT_REF_SORT() macro will append it to the string list;
- Append to the string list the sort key read from the
configuration;
- call ref_sorting_options() to turn the string list to ref_sorting
structure (which also deals with the default value).
As side effects, this change also cleans up a few issues:
- 95be717c (parse_opt_ref_sorting: always use with NONEG flag,
2019-03-20) muses that "git for-each-ref --no-sort" should simply
clear the sort keys accumulated so far; it now does.
- The implementation detail of "struct ref_sorting" and the helper
function parse_ref_sorting() can now be private to the ref-filter
API implementation.
- If you set branch.sort to a bogus value, the any "git branch"
invocation, not only the listing mode, would abort with the
original code; now it doesn't
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Add a ref_sorting_release() and use it for some of the current API
users, the ref_sorting_default() function and its siblings will do a
malloc() which wasn't being free'd previously.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In C it isn't required to specify that all members of a struct are
zero'd out to 0, NULL or '\0', just providing a "{ 0 }" will
accomplish that.
Let's also change code that provided N zero'd fields to just
provide one, and change e.g. "{ NULL }" to "{ 0 }" for
consistency. I.e. even if the first member is a pointer let's use "0"
instead of "NULL". The point of using "0" consistently is to pick one,
and to not have the reader wonder why we're not using the same pattern
everywhere.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
No callers pass in anything but "0" here. Likewise to our sibling
functions. Note that some of them ferry along the flag, but none of
their callers pass anything but "0" either.
Nor is anybody likely to change that. Callers which really want to see
all of the raw refs use for_each_rawref(). And anybody interested in
iterating a subset of the refs will likely be happy to use the
now-default behavior of showing broken refs, but omitting dangling
symlinks.
So we can get rid of this whole feature.
Signed-off-by: Jeff King <peff@peff.net>
Reviewed-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Now that none of our callers passes the INCLUDE_BROKEN flag, we can drop
it entirely, along with the code to plumb it through to the
for_each_fullref_in() functions.
Signed-off-by: Jeff King <peff@peff.net>
Reviewed-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Prepare the "ref-filter" machinery that drives the "--format"
option of "git for-each-ref" and its friends to be used in "git
cat-file --batch".
* zh/ref-filter-raw-data:
ref-filter: add %(rest) atom
ref-filter: use non-const ref_format in *_atom_parser()
ref-filter: --format=%(raw) support --perl
ref-filter: add %(raw) atom
ref-filter: add obj-type check in grab contents
Leak plugging.
* ah/plugleaks:
reset: clear_unpack_trees_porcelain to plug leak
builtin/rebase: fix options.strategy memory lifecycle
builtin/merge: free found_ref when done
builtin/mv: free or UNLEAK multiple pointers at end of cmd_mv
convert: release strbuf to avoid leak
read-cache: call diff_setup_done to avoid leak
ref-filter: also free head for ATOM_HEAD to avoid leak
diffcore-rename: move old_dir/new_dir definition to plug leak
builtin/for-each-repo: remove unnecessary argv copy to plug leak
builtin/submodule--helper: release unused strbuf to avoid leak
environment: move strbuf into block to plug leak
fmt-merge-msg: free newly allocated temporary strings when done
u.head is populated using resolve_refdup(), which returns a newly
allocated string - hence we also need to free() it.
Found while running t0041 with LSAN:
Direct leak of 16 byte(s) in 1 object(s) allocated from:
#0 0x486804 in strdup ../projects/compiler-rt/lib/asan/asan_interceptors.cpp:452:3
#1 0xa8be98 in xstrdup wrapper.c:29:14
#2 0x9481db in head_atom_parser ref-filter.c:549:17
#3 0x9408c7 in parse_ref_filter_atom ref-filter.c:703:30
#4 0x9400e3 in verify_ref_format ref-filter.c:974:8
#5 0x4f9e8b in print_ref_list builtin/branch.c:439:6
#6 0x4f9e8b in cmd_branch builtin/branch.c:757:3
#7 0x4ce83e in run_builtin git.c:475:11
#8 0x4ccafe in handle_builtin git.c:729:3
#9 0x4cb01c in run_argv git.c:818:4
#10 0x4cb01c in cmd_main git.c:949:19
#11 0x6bdc2d in main common-main.c:52:11
#12 0x7f96edf86349 in __libc_start_main (/lib64/libc.so.6+0x24349)
SUMMARY: AddressSanitizer: 16 byte(s) leaked in 1 allocation(s).
Signed-off-by: Andrzej Hunt <andrzej@ahunt.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
%(rest) is a atom used for cat-file batch mode, which can split
the input lines at the first whitespace boundary, all characters
before that whitespace are considered to be the object name;
characters after that first run of whitespace (i.e., the "rest"
of the line) are output in place of the %(rest) atom.
In order to let "cat-file --batch=%(rest)" use the ref-filter
interface, add %(rest) atom for ref-filter.
Introduce the reject_atom() to reject the atom %(rest) for
"git for-each-ref", "git branch", "git tag" and "git verify-tag".
Reviewed-by: Jacob Keller <jacob.keller@gmail.com>
Suggected-by: Jacob Keller <jacob.keller@gmail.com>
Mentored-by: Christian Couder <christian.couder@gmail.com>
Mentored-by: Hariom Verma <hariom18599@gmail.com>
Signed-off-by: ZheNing Hu <adlternative@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Use non-const ref_format in *_atom_parser(), which can help us
modify the members of ref_format in *_atom_parser().
Mentored-by: Christian Couder <christian.couder@gmail.com>
Mentored-by: Hariom Verma <hariom18599@gmail.com>
Signed-off-by: ZheNing Hu <adlternative@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Because the perl language can handle binary data correctly,
add the function perl_quote_buf_with_len(), which can specify
the length of the data and prevent the data from being truncated
at '\0' to help `--format="%(raw)"` support `--perl`.
Reviewed-by: Jacob Keller <jacob.keller@gmail.com>
Helped-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: ZheNing Hu <adlternative@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Add new formatting option `%(raw)`, which will print the raw
object data without any changes. It will help further to migrate
all cat-file formatting logic from cat-file to ref-filter.
The raw data of blob, tree objects may contain '\0', but most of
the logic in `ref-filter` depends on the output of the atom being
text (specifically, no embedded NULs in it).
E.g. `quote_formatting()` use `strbuf_addstr()` or `*._quote_buf()`
add the data to the buffer. The raw data of a tree object is
`100644 one\0...`, only the `100644 one` will be added to the buffer,
which is incorrect.
Therefore, we need to find a way to record the length of the
atom_value's member `s`. Although strbuf can already record the
string and its length, if we want to replace the type of atom_value's
member `s` with strbuf, many places in ref-filter that are filled
with dynamically allocated mermory in `v->s` are not easy to replace.
At the same time, we need to check if `v->s == NULL` in
populate_value(), and strbuf cannot easily distinguish NULL and empty
strings, but c-style "const char *" can do it. So add a new member in
`struct atom_value`: `s_size`, which can record raw object size, it
can help us add raw object data to the buffer or compare two buffers
which contain raw object data.
Note that `--format=%(raw)` cannot be used with `--python`, `--shell`,
`--tcl`, and `--perl` because if the binary raw data is passed to a
variable in such languages, these may not support arbitrary binary data
in their string variable type.
Reviewed-by: Jacob Keller <jacob.keller@gmail.com>
Mentored-by: Christian Couder <christian.couder@gmail.com>
Mentored-by: Hariom Verma <hariom18599@gmail.com>
Helped-by: Bagas Sanjaya <bagasdotme@gmail.com>
Helped-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Helped-by: Felipe Contreras <felipe.contreras@gmail.com>
Helped-by: Phillip Wood <phillip.wood@dunelm.org.uk>
Helped-by: Junio C Hamano <gitster@pobox.com>
Based-on-patch-by: Olga Telezhnaya <olyatelezhnaya@gmail.com>
Signed-off-by: ZheNing Hu <adlternative@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Only tag and commit objects use `grab_sub_body_contents()` to grab
object contents in the current codebase. We want to teach the
function to also handle blobs and trees to get their raw data,
without parsing a blob (whose contents looks like a commit or a tag)
incorrectly as a commit or a tag. So it's needed to pass a
`struct expand_data *data` instread of only `void *buf` to both
`grab_sub_body_contents()` and `grab_values()` to be able to check
the object type.
Skip the block of code that is specific to handling commits and tags
early when the given object is of a wrong type to help later
addition to handle other types of objects in this function.
Reviewed-by: Jacob Keller <jacob.keller@gmail.com>
Mentored-by: Christian Couder <christian.couder@gmail.com>
Mentored-by: Hariom Verma <hariom18599@gmail.com>
Helped-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: ZheNing Hu <adlternative@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Add missing __attribute__((format)) function attributes to various
"static" functions that take printf arguments.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In the original ref-filter design, it will copy the parsed
atom's name and attributes to `used_atom[i].name` in the
atom's parsing step, and use it again for string matching
in the later specific ref attributes filling step. It use
a lot of string matching to determine which atom we need.
Introduce the enum "atom_type", each enum value is named
as `ATOM_*`, which is the index of each corresponding
valid_atom entry. In the first step of the atom parsing,
`used_atom.atom_type` will record corresponding enum value
from valid_atom entry index, and then in specific reference
attribute filling step, only need to compare the value of
the `used_atom[i].atom_type` to check the atom type.
Helped-by: Junio C Hamano <gitster@pobox.com>
Helped-by: Christian Couder <christian.couder@gmail.com>
Signed-off-by: ZheNing Hu <adlternative@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When the support for "objectsize:disk" was bolted onto the
existing support for "objectsize", it didn't follow the
usual pattern for handling "atomtype:modifier", which reads
the <modifier> part just once while parsing the format
string, and store the parsed result in the union in the
used_atom structure, so that the string form of it does not
have to be parsed over and over at runtime (e.g. in
grab_common_values()).
Add a new member `objectsize` to the union `used_atom.u`,
so that we can separate the check of <modifier> from the
check of <atomtype>, this will bring scalability to atom
`%(objectsize)`.
Signed-off-by: ZheNing Hu <adlternative@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Inlining the exported function `show_ref_array_item()`,
which is not providing the right level of abstraction,
simplifies the API and can unlock improvements at the
former call sites.
Helped-by: René Scharfe <l.s.r@web.de>
Signed-off-by: ZheNing Hu <adlternative@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
A NULL-dereference bug has been corrected in an error codepath in
"git for-each-ref", "git branch --list" etc.
* jk/ref-filter-segfault-fix:
ref-filter: fix NULL check for parse object failure