Commit Graph

28 Commits

Author SHA1 Message Date
fc1ddf42af t: remove TEST_PASSES_SANITIZE_LEAK annotations
Now that the default value for TEST_PASSES_SANITIZE_LEAK is `true` there
is no longer a need to have that variable declared in all of our tests.
Drop it.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-11-21 08:23:48 +09:00
365529e1ea Merge branch 'ps/leakfixes-part-7'
More leak-fixes.

* ps/leakfixes-part-7: (23 commits)
  diffcore-break: fix leaking filespecs when merging broken pairs
  revision: fix leaking parents when simplifying commits
  builtin/maintenance: fix leak in `get_schedule_cmd()`
  builtin/maintenance: fix leaking config string
  promisor-remote: fix leaking partial clone filter
  grep: fix leaking grep pattern
  submodule: fix leaking submodule ODB paths
  trace2: destroy context stored in thread-local storage
  builtin/difftool: plug several trivial memory leaks
  builtin/repack: fix leaking configuration
  diffcore-order: fix leaking buffer when parsing orderfiles
  parse-options: free previous value of `OPTION_FILENAME`
  diff: fix leaking orderfile option
  builtin/pull: fix leaking "ff" option
  dir: fix off by one errors for ignored and untracked entries
  builtin/submodule--helper: fix leaking remote ref on errors
  t/helper: fix leaking subrepo in nested submodule config helper
  builtin/submodule--helper: fix leaking error buffer
  builtin/submodule--helper: clear child process when not running it
  submodule: fix leaking update strategy
  ...
2024-10-02 07:46:26 -07:00
6932ec8183 diffcore-order: fix leaking buffer when parsing orderfiles
In `prepare_order()` we parse an orderfile and assign it to a global
array. In order to save on some allocations, we replace newlines with
NUL characters and then assign pointers into the allocated buffer to
that array. This can cause the buffer to be completely unreferenced
though in some cases, e.g. because the order file is empty or because we
had to use `xmemdupz()` to copy the lines instead of NUL-terminating
them.

Refactor the code to always `xmemdupz()` the strings. This is a bit
simpler, and it is rather unlikely that saving a handful of allocations
really matters. This allows us to release the string buffer and thus
plug the memory leak.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-09-27 08:25:35 -07:00
d6bf6527eb Revert "Merge branch 'jc/patch-id' into maint-2.46"
This reverts commit 41c952ebac,
reversing changes made to 712d970c01.
Keeping a known breakage for now is better than introducing new
regression(s).
2024-09-16 15:12:06 -07:00
41c952ebac Merge branch 'jc/patch-id' into maint-2.46
The patch parser in "git patch-id" has been tightened to avoid
getting confused by lines that look like a patch header in the log
message.
cf. <Zqh2T_2RLt0SeKF7@tanuki>

* jc/patch-id:
  patch-id: tighten code to detect the patch header
  patch-id: rewrite code that detects the beginning of a patch
  patch-id: make get_one_patchid() more extensible
  patch-id: call flush_current_id() only when needed
  t4204: patch-id supports various input format
2024-09-12 11:02:16 -07:00
44773b9f70 Merge branch 'jc/patch-id'
The patch parser in "git patch-id" has been tightened to avoid
getting confused by lines that look like a patch header in the log
message.

* jc/patch-id:
  patch-id: tighten code to detect the patch header
  patch-id: rewrite code that detects the beginning of a patch
  patch-id: make get_one_patchid() more extensible
  patch-id: call flush_current_id() only when needed
  t4204: patch-id supports various input format
2024-08-14 14:54:53 -07:00
a6e9429f72 patch-id: tighten code to detect the patch header
The get_one_patchid() function unconditionally takes a line that
matches the patch header (namely, a line that begins with a full
object name, possibly prefixed by "commit" or "From" plus a space)
as the beginning of a patch.  Even when it is *not* looking for one
(namely, when the previous call found the patch header and returned,
and then we are called again to skip the log message and process the
patch whose header was found by the previous invocation).

As a consequence, a line in the commit log message that begins with
one of these patterns can be mistaken to start another patch, with
current message entirely skipped (because we haven't even reached
the patch at all).

Allow the caller to tell us if it called us already and saw the
patch header (in which case we shouldn't be looking for another one,
until we see the "diff" part of the patch; instead we simply should
be skipping these lines as part of the commit log message), and skip
the header processing logic when that is the case.  In the helper
function, it also needs to flip this "are we looking for a header?"
bit, once it finished skipping the commit log message and started
processing the patches, as the patch header of the _next_ message is
the only clue in the input that the current patch is done.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-07-29 18:19:14 -07:00
4210ea6f0f t4204: patch-id supports various input format
"git patch-id" was first developed to read from "git diff-tree
--stdin -p" output.  Later it was enhanced to read from "git
diff-tree --stdin -p -v", which was the downstream of an early
imitation of "git log" ("git rev-list" run in the upstream of a pipe
to feed the "diff-tree").  These days, we also read from "git
format-patch".

Their output begins slightly differently, but the patch-id computed
over them for the same commit should be the same.  Ensure that we
won't accidentally break this expectation.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-07-29 18:19:14 -07:00
4a1c95931f builtin/patch-id: fix uninitialized hash function
In c8aed5e8da (repository: stop setting SHA1 as the default object hash,
2024-05-07), we have adapted `initialize_repository()` to no longer set
up a default hash function. As this function is also used to set up
`the_repository`, the consequence is that `the_hash_algo` will now by
default be a `NULL` pointer unless the hash algorithm was configured
properly. This is done as a mechanism to detect cases where we may be
using the wrong hash function by accident.

This change now causes git-patch-id(1) to segfault when it's run outside
of a repository. As this command can read diffs from stdin, it does not
necessarily need a repository, but then relies on `the_hash_algo` to
compute the patch ID itself.

It is somewhat dubious that git-patch-id(1) relies on `the_hash_algo` in
the first place. Quoting its manpage:

    A "patch ID" is nothing but a sum of SHA-1 of the file diffs
    associated with a patch, with line numbers ignored. As such, it’s
    "reasonably stable", but at the same time also reasonably unique,
    i.e., two patches that have the same "patch ID" are almost
    guaranteed to be the same thing.

We explicitly document patch IDs to be using SHA-1. Furthermore, patch
IDs are supposed to be stable for most of the part. But even with the
same input, the patch IDs will now be different depending on the repo's
configured object hash.

Work around the issue by setting up SHA-1 when there was no startup
repository for now. This is arguably not the correct fix, but for now we
rather want to focus on getting the segfault fixed.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-05-21 09:05:13 -07:00
2871f4d447 builtin: patch-id: add --verbatim as a command mode
There are situations where the user might not want the default
setting where patch-id strips all whitespace. They might be working
in a language where white space is syntactically important, or they
might have CI testing that enforces strict whitespace linting. In
these cases, a whitespace change would result in the patch
fundamentally changing, and thus deserving of a different id.

Add a new mode that is exclusive of --stable and --unstable called
--verbatim. It also corresponds to the config
patchid.verbatim = true. In this mode, the stable algorithm is
used and whitespace is not stripped from the patch text.

Users of --unstable mainly care about compatibility with old git
versions, which unstripping the whitespace would break. Thus there
isn't a usecase for the combination of --verbatim and --unstable,
and we don't expose this so as to not add maintainence burden.

Signed-off-by: Jerry Zhang <jerry@skydio.com>
fixes https://github.com/Skydio/revup/issues/2
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-10-24 15:44:20 -07:00
0df19eb9d9 builtin: patch-id: fix patch-id with binary diffs
"git patch-id" currently doesn't produce correct output if the
incoming diff has any binary files. Add logic to get_one_patchid
to handle the different possible styles of binary diff. This
attempts to keep resulting patch-ids identical to what would be
produced by the counterpart logic in diff.c, that is it produces
the id by hashing the a and b oids in succession.

In general we handle binary diffs by first caching the object ids from
the "index" line and using those if we then find an indication
that the diff is binary.

The input could contain patches generated with "git diff --binary". This
currently breaks the parse logic and results in multiple patch-ids
output for a single commit. Here we have to skip the contents of the
patch itself since those do not go into the patch id. --binary
implies --full-index so the object ids are always available.

When the diff is generated with --full-index there is no patch content
to skip over.

When a diff is generated without --full-index or --binary, it will
contain abbreviated object ids. This will still result in a sufficiently
unique patch-id when hashed, but does not match internal patch id
output. We'll call this ok for now as we already need specialized
arguments to diff in order to match internal patch id (namely -U3).

Signed-off-by: Jerry Zhang <Jerry@skydio.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-10-24 15:44:19 -07:00
757e75c81e patch-id: fix scan_hunk_header on diffs with 1 line of before/after
Normally diffs will contain a hunk header of the format
"@@ -2,2 +2,15 @@ code". However when there is only 1 line of
change, the unified diff format allows for the second comma
separated value to be omitted in either before or after
line counts.

This can produce hunk headers that look like
"@@ -2 +2,18 @@ code" or "@@ -2,2 +2 @@ code".
As a result, scan_hunk_header mistakenly returns the line
number as line count, which then results in unpredictable
parsing errors with the rest of the patch, including giving
multiple lines of output for a single commit.

Fix by explicitly setting line count to 1 when there is
no comma, and add a test.

apply.c contains this same logic except it is correct. A
worthwhile future project might be to unify these two diff
parsers so they both benefit from fixes.

Signed-off-by: Jerry Zhang <jerry@skydio.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-02-02 11:24:23 -08:00
56fa5ac39a patch-id: fix antipatterns in tests
Clean up the tests for patch-id by moving file preparation
tasks inside the test body and redirecting files directly into
stdin instead of using 'cat'.

Signed-off-by: Jerry Zhang <jerry@skydio.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-02-02 11:24:15 -08:00
6b2edaa378 t4204 is not sanitizer clean at all
Earlier we marked that this patch-id test is leak-sanitizer clean,
but if we read the test script carefully, it is merely because we
have too many invocations of commands in the "git log" family on the
upstream side of the pipe, hiding breakages from them.

Split the pipeline so that breakages from these commands can be
caught (not limited to aborts due to leak-sanitizer) and unmark
the script as not passing the test with leak-sanitizer in effect.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-12-16 15:18:31 -08:00
9081a421a6 checkout: fix "branch info" memory leaks
The "checkout" command is one of the main sources of leaks in the test
suite, let's fix the common ones by not leaking from the "struct
branch_info".

Doing this is rather straightforward, albeit verbose, we need to
xstrdup() constant strings going into the struct, and free() the ones
we clobber as we go along.

This also means that we can delete previous partial leak fixes in this
area, i.e. the "path_to_free" accounting added by 96ec7b1e70 (Convert
resolve_ref+xstrdup to new resolve_refdup function, 2011-12-13).

There was some discussion about whether "we should retain the "const
char *" here and cast at free() time, or have it be a "char *". Since
this is not a public API with any sort of API boundary let's use
"char *", as is already being done for the "refname" member of the
same struct.

The tests to mark as passing were found with:

    rm .prove; GIT_SKIP_TESTS=t0027 prove -j8 --state=save t[0-9]*.sh :: --immediate
    # apply & compile this change
    prove -j8 --state=failed :: --immediate

I.e. the ones that were newly passing when the --state=failed command
was run. I left out "t3040-subprojects-basic.sh" and
"t4131-apply-fake-ancestor.sh" to to optimization-level related
differences similar to the ones noted in[1], except that these would
be something the current 'linux-leaks' job would run into.

1. https://lore.kernel.org/git/cover-v3-0.6-00000000000-20211022T175227Z-avarab@gmail.com/

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-11-18 14:32:26 -08:00
8f37854b18 t4*: adjust the references to the default branch name "main"
Carefully excluding t4013 and t4015, which see independent development
elsewhere at the time of writing, we use `main` as the default branch
name in t4*. This trick was performed via

	$ (cd t &&
	   sed -i -e 's/master/main/g' -e 's/MASTER/MAIN/g' \
		-e 's/Master/Main/g' -- t4*.sh t4211/*.export &&
	   git checkout HEAD -- t4013\*)

This allows us to define `GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME=main`
for those tests.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-11-19 15:44:18 -08:00
334afbc76f tests: mark tests relying on the current default for init.defaultBranch
In addition to the manual adjustment to let the `linux-gcc` CI job run
the test suite with `master` and then with `main`, this patch makes sure
that GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME is set in all test scripts
that currently rely on the initial branch name being `master by default.

To determine which test scripts to mark up, the first step was to
force-set the default branch name to `master` in

- all test scripts that contain the keyword `master`,

- t4211, which expects `t/t4211/history.export` with a hard-coded ref to
  initialize the default branch,

- t5560 because it sources `t/t556x_common` which uses `master`,

- t8002 and t8012 because both source `t/annotate-tests.sh` which also
  uses `master`)

This trick was performed by this command:

	$ sed -i '/^ *\. \.\/\(test-lib\|lib-\(bash\|cvs\|git-svn\)\|gitweb-lib\)\.sh$/i\
	GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME=master\
	export GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME\
	' $(git grep -l master t/t[0-9]*.sh) \
	t/t4211*.sh t/t5560*.sh t/t8002*.sh t/t8012*.sh

After that, careful, manual inspection revealed that some of the test
scripts containing the needle `master` do not actually rely on a
specific default branch name: either they mention `master` only in a
comment, or they initialize that branch specificially, or they do not
actually refer to the current default branch. Therefore, the
aforementioned modification was undone in those test scripts thusly:

	$ git checkout HEAD -- \
		t/t0027-auto-crlf.sh t/t0060-path-utils.sh \
		t/t1011-read-tree-sparse-checkout.sh \
		t/t1305-config-include.sh t/t1309-early-config.sh \
		t/t1402-check-ref-format.sh t/t1450-fsck.sh \
		t/t2024-checkout-dwim.sh \
		t/t2106-update-index-assume-unchanged.sh \
		t/t3040-subprojects-basic.sh t/t3301-notes.sh \
		t/t3308-notes-merge.sh t/t3423-rebase-reword.sh \
		t/t3436-rebase-more-options.sh \
		t/t4015-diff-whitespace.sh t/t4257-am-interactive.sh \
		t/t5323-pack-redundant.sh t/t5401-update-hooks.sh \
		t/t5511-refspec.sh t/t5526-fetch-submodules.sh \
		t/t5529-push-errors.sh t/t5530-upload-pack-error.sh \
		t/t5548-push-porcelain.sh \
		t/t5552-skipping-fetch-negotiator.sh \
		t/t5572-pull-submodule.sh t/t5608-clone-2gb.sh \
		t/t5614-clone-submodules-shallow.sh \
		t/t7508-status.sh t/t7606-merge-custom.sh \
		t/t9302-fast-import-unpack-limit.sh

We excluded one set of test scripts in these commands, though: the range
of `git p4` tests. The reason? `git p4` stores the (foreign) remote
branch in the branch called `p4/master`, which is obviously not the
default branch. Manual analysis revealed that only five of these tests
actually require a specific default branch name to pass; They were
modified thusly:

	$ sed -i '/^ *\. \.\/lib-git-p4\.sh$/i\
	GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME=master\
	export GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME\
	' t/t980[0167]*.sh t/t9811*.sh

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-11-19 15:44:17 -08:00
7a1bcb251b t4204: make hash size independent
Use $OID_REGEX instead of a hard-coded regular expression.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-01-15 14:06:19 -08:00
d845d727cb Merge branch 'jk/setup-sequence-update'
There were numerous corner cases in which the configuration files
are read and used or not read at all depending on the directory a
Git command was run, leading to inconsistent behaviour.  The code
to set-up repository access at the beginning of a Git process has
been updated to fix them.

* jk/setup-sequence-update:
  t1007: factor out repeated setup
  init: reset cached config when entering new repo
  init: expand comments explaining config trickery
  config: only read .git/config from configured repos
  test-config: setup git directory
  t1302: use "git -C"
  pager: handle early config
  pager: use callbacks instead of configset
  pager: make pager_program a file-local static
  pager: stop loading git_default_config()
  pager: remove obsolete comment
  diff: always try to set up the repository
  diff: handle --no-index prefixes consistently
  diff: skip implicit no-index check when given --no-index
  patch-id: use RUN_SETUP_GENTLY
  hash-object: always try to set up the git repository
2016-09-21 15:15:24 -07:00
4a73aaaf18 patch-id: use RUN_SETUP_GENTLY
Patch-id does not require a repository because it is just
processing the incoming diff on stdin, but it may look at
git config for keys like patchid.stable.

Even though we do not setup_git_directory(), this works from
the top-level of a repository because we blindly look at
".git/config" in this case. But as the included test
demonstrates, it does not work from a subdirectory.

We can fix it by using RUN_SETUP_GENTLY. We do not take any
filenames from the user on the command line, so there's no
need to adjust them via prefix_filename().

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-09-13 15:45:45 -07:00
5c63920190 t4204: do not let $name variable clobbered
test_patch_id_file_order shell function uses $name variable to hold
one filename, and calls another shell function calc_patch_id as a
downstream of one pipeline.  The called function, however, also uses
the same $name variable.  With a shell implementation that runs the
callee in the current shell environment, the caller's $name would
be clobbered by the callee's use of the same variable.

This hasn't been an issue with dash and bash.  ksh93 reveals the
breakage in the test script.

Fix it by using a distinct variable name in the callee.

Reported-by: Armin Kunaschik <megabreit@googlemail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-05-24 15:49:02 -07:00
79e539404c Merge branch 'mt/patch-id-stable' (early part)
* 'mt/patch-id-stable' (early part):
  patch-id-test: test stable and unstable behaviour
  patch-id: make it stable against hunk reordering
  test doc: test_write_lines does not split its arguments
  test: add test_write_lines helper
2014-06-16 10:05:38 -07:00
8f2514e95f patch-id-test: test stable and unstable behaviour
Verify that patch ID supports an algorithm
that is stable against diff split and reordering.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-06-10 13:09:39 -07:00
ce21ccfae0 t4204-patch-id.sh: use the $( ... ) construct for command substitution
The Git CodingGuidelines prefer the $(...) construct for command
substitution instead of using the backquotes `...`.

The backquoted form is the traditional method for command
substitution, and is supported by POSIX.  However, all but the
simplest uses become complicated quickly.  In particular, embedded
command substitutions and/or the use of double quotes require
careful escaping with the backslash character.

The patch was generated by:

for _f in $(find . -name "*.sh")
do
   sed -i 's@`\(.*\)`@$(\1)@g' ${_f}
done

and then carefully proof-read.

Signed-off-by: Elia Pinto <gitter.spiros@gmail.com>
Reviewed-by: Matthieu Moy <Matthieu.Moy@imag.fr>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-04-30 11:08:09 -07:00
2485eab55c git-patch-id: do not trip over "no newline" markers
Currently, patch-id trips over our very own diff extension for marking
the absence of newline at EOF.

Fix it. (Ignore it, it's whitespace.)

Signed-off-by: Michael J Gruber <git@drmicha.warpmail.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-02-17 11:56:50 -08:00
f2b5e7af10 git-patch-id: test for "no newline" markers
Currently, patch-id trips over our very own output that marks the absence
of newline at EOF.

Expose this in a test.

Signed-off-by: Michael J Gruber <git@drmicha.warpmail.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-02-17 11:54:57 -08:00
580fb25b7a patch-id: Add support for mbox format
I have an alias that takes two arguments and compares their patch IDs.
I would like to use to make sure I've tested exactly what I submit
(patch by patch), like

   git patch-cmp origin/master.. file-being-sent

However, I cannot do that because git patch-id is fooled by the "-- "
trailer that git format-patch puts, or likely by the MIME boundary.

This patch adds hunk parsing logic to git patch-id in order to detect an
out of place "-" line and split the patch when it comes.  In addition,
commit ids in the "From " lines are considered and printed in the output.

Signed-off-by: Paolo Bonzini <bonzini@gnu.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2010-04-19 13:01:49 -07:00
7f733de04e test-suite: Make test script numbers unique
In order to selectively skip tests, the environment variable GIT_SKIP_TESTS
can be set like this:

  $ GIT_SKIP_TESTS='t1301 t4150.18' make test

That is, its value can contain only the test script numbers, but not the
full script name. Therefore, it is important that the test scripts are
uniquely numbered. This makes it so.

Signed-off-by: Johannes Sixt <j6t@kdbg.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2009-03-14 12:44:20 -07:00