Commit Graph

75032 Commits

Author SHA1 Message Date
81903d0472 Merge branch 'ss/packed-ref-store-leakfix'
Leakfix.

* ss/packed-ref-store-leakfix:
  refs/files: prevent memory leak by freeing packed_ref_store
2024-08-14 14:54:57 -07:00
7b11e20bff Merge branch 'cp/unit-test-reftable-tree'
A test in reftable library has been rewritten using the unit test
framework.

* cp/unit-test-reftable-tree:
  t-reftable-tree: improve the test for infix_walk()
  t-reftable-tree: add test for non-existent key
  t-reftable-tree: split test_tree() into two sub-test functions
  t: move reftable/tree_test.c to the unit testing framework
  reftable: remove unnecessary curly braces in reftable/tree.c
2024-08-14 14:54:56 -07:00
61fd5de05f Merge branch 'kl/test-fixes'
A flakey test and incorrect calls to strtoX() functions have been
fixed.

* kl/test-fixes:
  t6421: fix test to work when repo dir contains d0
  set errno=0 before strtoX calls
2024-08-14 14:54:55 -07:00
494c9788e4 Merge branch 'jc/reflog-expire-lookup-commit-fix'
"git reflog expire" failed to honor annotated tags when computing
reachable commits.

* jc/reflog-expire-lookup-commit-fix:
  Revert "reflog expire: don't use lookup_commit_reference_gently()"
2024-08-14 14:54:55 -07:00
7a95eceb6b Merge branch 'jr/ls-files-expand-literal-doc'
Docfix.

* jr/ls-files-expand-literal-doc:
  doc: fix hex code escapes in git-ls-files
2024-08-14 14:54:54 -07:00
c147b41f4c Merge branch 'jc/leakfix-mailmap'
Leakfix.

* jc/leakfix-mailmap:
  mailmap: plug memory leak in read_mailmap_blob()
2024-08-14 14:54:54 -07:00
dfaa04f3c6 Merge branch 'jc/leakfix-hashfile'
Leakfix.

* jc/leakfix-hashfile:
  csum-file: introduce discard_hashfile()
2024-08-14 14:54:53 -07:00
44773b9f70 Merge branch 'jc/patch-id'
The patch parser in "git patch-id" has been tightened to avoid
getting confused by lines that look like a patch header in the log
message.

* jc/patch-id:
  patch-id: tighten code to detect the patch header
  patch-id: rewrite code that detects the beginning of a patch
  patch-id: make get_one_patchid() more extensible
  patch-id: call flush_current_id() only when needed
  t4204: patch-id supports various input format
2024-08-14 14:54:53 -07:00
c7ca437d9f Merge branch 'ps/refs-wo-the-repository'
In the refs subsystem, implicit reliance of the_repository has been
eliminated; the repository associated with the ref store object is
used instead.

* ps/refs-wo-the-repository:
  refs/reftable: stop using `the_repository`
  refs/packed: stop using `the_repository`
  refs/files: stop using `the_repository`
  refs/files: stop using `the_repository` in `parse_loose_ref_contents()`
  refs: stop using `the_repository`
2024-08-14 14:54:52 -07:00
5a74eb07ca Merge branch 'jc/jl-git-no-advice-fix'
Remove leftover debugging cruft from a test script.

* jc/jl-git-no-advice-fix:
  t0018: remove leftover debugging cruft
2024-08-14 14:54:51 -07:00
4cf2f1be56 Merge branch 'tb/config-fixed-value-with-valueless-true'
"git config --value=foo --fixed-value section.key newvalue" barfed
when the existing value in the configuration file used the
valueless true syntax, which has been corrected.

* tb/config-fixed-value-with-valueless-true:
  config.c: avoid segfault with --fixed-value and valueless config
2024-08-14 14:54:51 -07:00
0b2c4bc3ff Merge branch 'jk/apply-patch-mode-check-fix'
The patch parser in 'git apply' has been a bit more lenient against
unexpected mode bits, like 100664, recorded on extended header lines.

* jk/apply-patch-mode-check-fix:
  apply: canonicalize modes read from patches
2024-08-14 14:54:50 -07:00
505312a83f Merge branch 'ps/ref-api-cleanup'
Code clean-up.

* ps/ref-api-cleanup:
  refs: drop `ref_store`-less functions
2024-08-14 14:54:50 -07:00
760348212b Merge branch 'ps/ls-remote-out-of-repo-fix'
A recent update broke "git ls-remote" used outside a repository,
which has been corrected.

* ps/ls-remote-out-of-repo-fix:
  builtin/ls-remote: fall back to SHA1 outside of a repo
2024-08-14 14:54:49 -07:00
ecbed3ff45 Merge branch 'jc/transport-leakfix'
Leakfix.

* jc/transport-leakfix:
  transport: fix leak with transport helper URLs
2024-08-14 14:54:49 -07:00
4bad0119f2 Merge branch 'rh/http-proxy-path'
The value of http.proxy can have "path" at the end for a socks
proxy that listens to a unix-domain socket, but we started to
discard it when we taught proxy auth code path to use the
credential helpers, which has been corrected.

* rh/http-proxy-path:
  http: do not ignore proxy path
2024-08-14 14:54:49 -07:00
d65332f241 Merge branch 'cp/unit-test-reftable-pq'
The tests for "pq" part of reftable library got rewritten to use
the unit test framework.

* cp/unit-test-reftable-pq:
  t-reftable-pq: add tests for merged_iter_pqueue_top()
  t-reftable-pq: add test for index based comparison
  t-reftable-pq: make merged_iter_pqueue_check() callable by reference
  t-reftable-pq: make merged_iter_pqueue_check() static
  t: move reftable/pq_test.c to the unit testing framework
  reftable: change the type of array indices to 'size_t' in reftable/pq.c
  reftable: remove unnecessary curly braces in reftable/pq.c
2024-08-14 14:54:48 -07:00
6c3c451fb6 Merge branch 'jk/osxkeychain-username-is-nul-terminated'
The credential helper to talk to OSX keychain sometimes sent
garbage bytes after the username, which has been corrected.

* jk/osxkeychain-username-is-nul-terminated:
  credential/osxkeychain: respect NUL terminator in username
2024-08-14 14:54:48 -07:00
4385f8a52d Merge branch 'ps/leakfixes-part-3'
More leakfixes.

* ps/leakfixes-part-3: (24 commits)
  commit-reach: fix trivial memory leak when computing reachability
  convert: fix leaking config strings
  entry: fix leaking pathnames during delayed checkout
  object-name: fix leaking commit list items
  t/test-repository: fix leaking repository
  builtin/credential-cache: fix trivial leaks
  builtin/worktree: fix leaking derived branch names
  builtin/shortlog: fix various trivial memory leaks
  builtin/rerere: fix various trivial memory leaks
  builtin/credential-store: fix leaking credential
  builtin/show-branch: fix several memory leaks
  builtin/rev-parse: fix memory leak with `--parseopt`
  builtin/stash: fix various trivial memory leaks
  builtin/remote: fix various trivial memory leaks
  builtin/remote: fix leaking strings in `branch_list`
  builtin/ls-remote: fix leaking `pattern` strings
  builtin/submodule--helper: fix leaking buffer in `is_tip_reachable`
  builtin/submodule--helper: fix leaking clone depth parameter
  builtin/name-rev: fix various trivial memory leaks
  builtin/describe: fix trivial memory leak when describing blob
  ...
2024-08-14 14:54:47 -07:00
bbc04b0094 t9001-send-email.sh: update alias list used for pine test
The set of aliases used for the pine --dump-aliases test do not
perfectly mesh with the way the pine address book is defined. While
technically all valid, there are some oddities including bob's name
being partially split so that the actual address is returned as
"Bobbyton <bob@example.com>". A strict reading of the pine documentation
indicates that the address should either be of the form
"address@domain" or a comma separated list of address, name/address
pairs, or other aliases enclosed by ().

The parsing implementation in git-send-email is not as strict, but it
makes sense to ensure the test data used is. Although the --dump-aliases
test does not make use of the address data, it is helpful to avoid
giving future developers the wrong impression of the file format.

Also add an alias which translates to multiple addresses using the ()
format.

Signed-off-by: Jacob Keller <jacob.keller@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-08-14 12:13:00 -07:00
4b707a6e99 p1500: add is-base performance tests
The previous two changes introduced a commit walking heuristic for finding
the most likely base branch for a given source. This algorithm walks
first-parent histories until reaching a collision.

This walk _should_ be very fast. Exceptions include cases where a
commit-graph file does not exist, leading to a full walk of all reachable
commits to compute generation numbers, or a case where no collision in the
first-parent history exists, leading to a walk of all first-parent history
to the root commits.

The p1500 test script guarantees a complete commit-graph file during its
setup, so we will not test that scenario. Do create a new root commit in an
effort to test the scenario of parallel first-parent histories.

Even with the extra root commit, these tests take no longer than 0.02
seconds on my machine for the Git repository. However, the results are
slightly more interesting in a copy of the Linux kernel repository:

Test
---------------------------------------------------------------
1500.2: ahead-behind counts: git for-each-ref              0.12
1500.3: ahead-behind counts: git branch                    0.12
1500.4: ahead-behind counts: git tag                       0.12
1500.5: contains: git for-each-ref --merged                0.04
1500.6: contains: git branch --merged                      0.04
1500.7: contains: git tag --merged                         0.04
1500.8: is-base check: test-tool reach (refs)              0.03
1500.9: is-base check: test-tool reach (tags)              0.03
1500.10: is-base check: git for-each-ref                   0.03
1500.11: is-base check: git for-each-ref (disjoint-base)   0.07

Signed-off-by: Derrick Stolee <stolee@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-08-14 10:10:06 -07:00
9c1732ca11 for-each-ref: add 'is-base' token
The previous change introduced the get_branch_base_for_tip() method in
commit-reach.c. The motivation of that change was about using a heuristic to
deteremine the base branch for a source commit from a list of candidate
commit tips. This change makes that algorithm visible to users via a new
atom in the 'git for-each-ref' format. This change is very similar to the
chang in 49abcd21da (for-each-ref: add ahead-behind format atom,
2023-03-20).

Introduce the 'is-base:<source>' atom, which will indicate that the
algorithm should be computed and the result of the algorithm is reported
using an indicator of the form '(<source>)'. For example, using
'%(is-base:HEAD)' would result in one line having the token '(HEAD)'.

Use the sorted order of refs included in the ref filter to break ties in the
algorithm's heuristic. In the previous change, the motivating examples
include using an L0 trunk, long-lived L1 branches, and temporary release
branches. A caller could communicate the ordered preference among these
categories using the input refpecs and avoiding a different sort mechanism.
This sorting behavior is tested in the test scripts.

It is important to include this atom as a special case to
can_do_iterative_format() to match the expectations created in bd98f9774e
(ref-filter.c: filter & format refs in the same callback, 2023-11-14). The
ahead-behind atom was one of the special cases, and this similarly requires
using an algorithm across all input refs before starting the format of any
single ref.

In the test script, the format tokens use colons or lack whitespace to avoid
Git complaining about trailing whitespace errors.

Signed-off-by: Derrick Stolee <stolee@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-08-14 10:10:06 -07:00
69020d034b commit: add gentle reference lookup method
The lookup_commit_reference_by_name() method uses lookup_commit_reference()
without an option to use lookup_commit_reference_gently(). Create a gentle
version of the method so it can be used in locations where non-commits may
be found but error messages should be silenced.

Signed-off-by: Derrick Stolee <derrickstolee@github.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-08-14 10:10:05 -07:00
e32eaf73b0 commit-reach: add get_branch_base_for_tip
Add a new reachability algorithm that intends to discover (from a heuristic)
which branch was used as the starting point for a given commit. Add focused
tests using the 'test-tool reach' command.

In repositories that use pull requests (or merge requests) to advance one or
more "protected" branches, the history of that reference can be recovered by
following the first-parent history in most cases. Most are completed using
no-fast-forward merges, though squash merges are quite common. Less common
is rebase-and-merge, which still validates this assumption. Finally, the
case that breaks this assumption is the fast-forward update (with potential
rebasing).  Even in this case, the previous commit commonly appears in the
first-parent history of the branch.

Similar assumptions can be made for a topic branch created by a single user
with the intention to merge back into another branch. Using 'git commit',
'git merge', and 'git cherry-pick' from HEAD will default to having the
first-parent commit be the previous commit at HEAD. This history changes
only with commands such as 'git reset' or 'git rebase', where the command
names also imply that the branch is starting from a new location.

With this movement of branches in mind, the following heuristic is proposed
as a way to determine the base branch for a given source branch:

  Among a list of candidate base branches, select the candidate that
  minimizes the number of commits in the first-parent history of the source
  that are not in the first-parent history of the candidate.

Prior third-party solutions to this problem have used this optimization
criteria, but have relied upon extracting the first-parent history and
comparing those lists as tables instead of using commit-graph walks.

Given current command-line interface options, this optimization criteria is
not easy to detect directly. Even using the command

  git rev-list --count --first-parent <base>..<source>

does not measure this count, as it uses full reachability from <base> to
determine which commits to remove from the range '<base>..<source>'. This
may lead to one asking if we should instead be using the full reachability
of the candidate and only the first-parent history of the source. This,
unfortunately, does not work for repositories that use long-lived branches
and automation to merge across those branches.

In extremely large repositories, merging into a single trunk may not be
feasible.  This is usually due to the desired frequency of updates
(thousands of engineers doing daily work) combined with the time required to
perform a validation build.  These factors combine to create significant
risk of semantic merge conflicts, leading to build breaks on the trunk. In
response, repository maintainers can create a single Level Zero (L0) trunk
and multiple Level One (L1) branches. By partitioning the engineers by
organization, these engineers may see lower risk of semantic merge conflicts
as well as be protected against build breaks in other L1 branches. The key
to making this system work is a semi-automated process of merging L1
branches into the L0 trunk and vice-versa.  In a large enough organization,
these L1 branches may further split into L2 or L3 branches, but the same
principles apply for merging across deeper levels.

If these automated merges use a typical merge with the second parent
bringing in the "new" content, then each L0 and L1 branch can track its
previous positions by following first-parent history, which appear as
parallel paths (until reaching the first place where the branches diverged).
If we also walk to second parents, then the histories overlap significantly
and cannot be distinguished except for very-recent changes.

For this reason, the first-parent condition should be symmetrical across the
base and source branches.

Another common case for desiring the result of this optimization method is
the use of release branches. When releasing a version of a repository, a
branch can be used to track that release. Any updates that are worth fixing
in that release can be merged to the release branch and shipped with only
the necessary fixes without any new features introduced in the trunk branch.
The 'maint-2.<X>' branches represent this pattern in the Git project. The
microsoft/git fork uses 'vfs-2.<X>.<Y>' branches to track the changes that
are custom to that fork on top of each upstream Git release 2.<X>.<Y>. This
application doesn't need the symmetrical first-parent condition, but the use
of first-parent histories does not change the results for these branches.

To determine the base branch from a list of candidates, create a new method
in commit-reach.c that performs a single* commit-graph walk. The core
concept is to walk first-parents starting at the candidate bases and the
source, tracking the "best" base to reach a given commit. Use generation
numbers to ensure that a commit is walked at most once and all children have
been explored before visiting it.  When reaching a commit that is reachable
from both a base and the source, we will then have a guarantee that this is
the closest intersection of first-parent histories. Track the best base to
reach that commit and return it as a result. In rare cases involving
multiple root commits, the first-parent history of the source may never
intersect any of the candidates and thus a null result is returned.

* There are up to two walks, since we require all commits to have a computed
  generation number in order to avoid incorrect results. This is similar to
  the need for computed generation numbers in ahead_behind() as implemented
  in fd67d149bd (commit-reach: implement ahead_behind() logic, 2023-03-20).

In order to track the "best" base, use a new commit slab that stores an
integer.  This value defaults to zero upon initialization, so use -1 to
track that the source commit can reach this commit and use 'i + 1' to track
that the ith base can reach this commit. When multiple bases can reach a
commit, minimize the index to break ties. This allows the caller to specify
an order to the bases that determines some amount of preference when the
heuristic does not result in a unique result.

The trickiest part of the integer slab is what happens when reaching a
collision among the histories of the bases and the history of the source.
This is noticed when viewing the first parent and seeing that it has a slab
value that differs in sign (negative or positive). In this case, the
collision commit is stored in the method variable 'branch_point' and its
slab value is set to -1. The index of the best base (so far) is stored in
the method variable 'best_index'. It is possible that there are multiple
commits that have the branch_point as its first parent, leading to multiple
updates of best_index.  The result is determined when 'branch_point' is
visited in the commit walk, giving the guarantee that all commits that could
reach 'branch_point' were visited.

Several interesting cases of collisions and different results are tested in
the t6600-test-reach.sh script. Recall that this script also tests the
algorithm in three possible states involving the commit-graph file and how
many commits are written in the file. This provides some coverage of the
need (and lack of need) for the ensure_generations_valid() method.

Signed-off-by: Derrick Stolee <stolee@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-08-14 10:10:05 -07:00
77d4b3dd73 builtin/diff: free symmetric diff members
We populate a `struct symdiff` in case the user has requested a
symmetric diff. Part of this is to populate a `skip` bitmap that
indicates which commits shall be ignored in the diff. But while this
bitmap is dynamically allocated, we never free it.

Fix this by introducing and calling a new `symdiff_release()` function
that does this for us.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-08-14 10:08:02 -07:00
36f971f861 diff: free state populated via options
The `objfind` and `anchors` members of `struct diff_options` are
populated via option parsing, but are never freed in `diff_free()`. Fix
this to plug those memory leaks.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-08-14 10:08:01 -07:00
0aaca0ec09 builtin/log: fix leak when showing converted blob contents
In `show_blob_object()`, we proactively call `textconv_object()`. In
case we have a textconv driver for this blob we will end up showing the
converted contents, otherwise we'll show the un-converted contents of it
instead.

When the object has been converted we never free the buffer containing
the converted contents. Fix this to plug this memory leak.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-08-14 10:08:01 -07:00
38678e5df5 userdiff: fix leaking memory for configured diff drivers
The userdiff structures may be initialized either statically on the
stack or dynamically via configuration keys. In the latter case we end
up leaking memory because we didn't have any infrastructure to discern
those strings which have been allocated statically and those which have
been allocated dynamically.

Refactor the code such that we have two pointers for each of these
strings: one that holds the value as accessed by other subsystems, and
one that points to the same string in case it has been allocated. Like
this, we can safely free the second pointer and thus plug those memory
leaks.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-08-14 10:08:01 -07:00
1bc158e750 builtin/format-patch: fix various trivial memory leaks
There are various memory leaks hit by git-format-patch(1). Basically all
of them are trivial, except that un-setting `diffopt.no_free` requires
us to unset the `diffopt.file` because we manually close it already.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-08-14 10:08:01 -07:00
6b15d9ca7f diff: fix leak when parsing invalid ignore regex option
When parsing invalid ignore regexes passed via the `-I` option we don't
free already-allocated memory, leading to a memory leak. Fix this.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-08-14 10:08:00 -07:00
4dfd4f1dfe unpack-trees: clear index when not propagating it
When provided a pointer to a destination index, then `unpack_trees()`
will end up copying its `o->internal.result` index into the provided
pointer. In those cases it is thus not necessary to free the index, as
we have transferred ownership of it.

There are cases though where we do not end up transferring ownership of
the memory, but `clear_unpack_trees_porcelain()` will never discard the
index in that case and thus cause a memory leak. And right now it cannot
do so in the first place because we have no indicator of whether we did
or didn't transfer ownership of the index.

Adapt the code to zero out the index in case we transfer its ownership.
Like this, we can now unconditionally discard the index when being asked
to clear the `unpack_trees_options`.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-08-14 10:08:00 -07:00
2f07d228c3 sequencer: release todo list on error paths
We're not releasing the `todo_list` in `sequencer_pick_revisions()` when
hitting an error path. Restructure the function to have a common exit
path such that we can easily clean up the list and thus plug this memory
leak.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-08-14 10:08:00 -07:00
de54b450a3 merge-ort: unconditionally release attributes index
We conditionally release the index used for reading gitattributes in
merge-ort based on whether or the index has been populated. This check
uses `cache_nr` as a condition. This isn't sufficient though, as the
variable may be zero even when some other parts of the index have been
populated. This leads to memory leaks when sparse checkouts are in use,
as we may not end up releasing the sparse checkout patterns.

Fix this issue by unconditionally releasing the index.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-08-14 10:08:00 -07:00
a0b82622cb builtin/fast-export: plug leaking tag names
When resolving revisions in `get_tags_and_duplicates()`, we only
partially manage the lifetime of `full_name`. In fact, managing its
lifetime properly is almost impossible because we put direct pointers to
that variable into multiple lists without duplicating the string. The
consequence is that these strings will ultimately leak.

Refactor the code to make the lists we put those names into duplicate
the memory. This allows us to properly free the string as required and
thus plugs the memory leak.

While this requires us to allocate more data overall, it shouldn't be
all that bad given that the number of allocations corresponds with the
number of command line parameters, which typically aren't all that many.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-08-14 10:07:59 -07:00
8ed4e96b5b builtin/fast-export: fix leaking diff options
Before calling `handle_commit()` in a loop, we set `diffopt.no_free`
such that its contents aren't getting freed inside of `handle_commit()`.
We never unset that flag though, which means that the structure's
allocated resources will ultimately leak.

Fix this by unsetting the flag after the loop such that we release its
resources via `release_revisions()`.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-08-14 10:07:59 -07:00
0662f0dacb builtin/fast-import: plug trivial memory leaks
Plug some trivial memory leaks in git-fast-import(1).

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-08-14 10:07:59 -07:00
187b623eef builtin/notes: fix leaking struct notes_tree when merging notes
We allocate a `struct notes_tree` in `merge_commit()` which we then
initialize via `init_notes()`. It's not really necessary to allocate the
structure though given that we never pass ownership to the caller.
Furthermore, the allocation leads to a memory leak because despite its
name, `free_notes()` doesn't free the `notes_tree` but only clears it.

Fix this issue by converting the code to use an on-stack variable.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-08-14 10:07:59 -07:00
1ca57bea4a builtin/rebase: fix leaking commit.gpgsign value
In `get_replay_opts()`, we override the `gpg_sign` field that already
got populated by `sequencer_init_config()` in case the user has
"commit.gpgsign" set in their config. This creates a memory leak because
we overwrite the previously assigned value, which may have already
pointed to an allocated string.

Let's plug the memory leak by freeing the value before we overwrite it.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-08-14 10:07:58 -07:00
648abbe22d config: fix leaking comment character config
When the comment line character has been specified multiple times in the
configuration, then `git_default_core_config()` will cause a memory leak
because it unconditionally copies the string into `comment_line_str`
without free'ing the previous value. In fact, it can't easily free the
value in the first place because it may contain a string constant.

Refactor the code such that we track allocated comment character strings
via a separate non-constant variable `comment_line_str_to_free`. Adapt
sites that set `comment_line_str` to set both and free the old value
that was stored in `comment_line_str_to_free`.

This memory leak is being hit in t3404. As there are still other memory
leaks in that file we cannot yet mark it as passing with leak checking
enabled.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-08-14 10:07:58 -07:00
5f6519b62c submodule-config: fix leaking name entry when traversing submodules
We traverse through submodules in the tree via `tree_entry()`, passing
to it a `struct name_entry` that it is supposed to populate with the
tree entry's contents. We unnecessarily allocate this variable instead
of passing a variable that is allocated on the stack, and the ultimately
don't even free that variable. This is unnecessary and leaks memory.

Convert the variable to instead be allocated on the stack to plug the
memory leak.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-08-14 10:07:58 -07:00
d1c53f6703 read-cache: fix leaking hashfile when writing index fails
In `do_write_index()`, we use a `struct hashfile` to write the index
with a trailer hash. In case the write fails though, we never clean up
the allocated `hashfile` state and thus leak memory.

Refactor the code to have a common exit path where we can free this and
other allocated memory. While at it, refactor our use of `strbuf`s such
that we reuse the same buffer to avoid some unneeded allocations.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-08-14 10:07:58 -07:00
c81dcf630c bulk-checkin: fix leaking state TODO
When flushing a bulk-checking to disk we also reset the `struct
bulk_checkin_packfile` state. But while we free some of its members,
others aren't being free'd, leading to memory leaks:

  - The temporary packfile name is not getting freed.

  - The `struct hashfile` only gets freed in case we end up calling
    `finalize_hashfile()`. There are code paths though where that is not
    the case, namely when nothing has been written. For this, we need to
    make `free_hashfile()` public.

Fix those leaks.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-08-14 10:07:57 -07:00
9ddd5f755d object-name: fix leaking symlink paths in object context
The object context may be populated with symlink contents when reading a
symlink, but the associated strbuf doesn't ever get released when
releasing the object context, causing a memory leak. Plug it.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-08-14 10:07:57 -07:00
aa9ef614dc object-file: fix memory leak when reading corrupted headers
When reading corrupt object headers in `read_loose_object()`, we bail
out immediately. This causes a memory leak though because we would have
already initialized the zstream in `unpack_loose_header()`, and it is
the callers responsibility to finish the zstream even on error. While
this feels weird, other callsites do it correctly already.

Fix this leak by ending the zstream even on errors. We may want to
revisit this interface in the future such that the callee handles this
for us already when there was an error.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-08-14 10:07:57 -07:00
ce15f9eb9e git: fix leaking system paths
Git has some flags to make it output system paths as they have been
compiled into Git. This is done by calling `system_path()`, which
returns an allocated string. This string isn't ever free'd though,
creating a memory leak.

Plug those leaks. While they are surfaced by t0211, there are more
memory leaks looming exposed by that test suite and it thus does not yet
pass with the memory leak checker enabled.

Helped-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-08-14 10:07:56 -07:00
ce01f92889 remote: plug memory leak when aliasing URLs
When we have a `url.*.insteadOf` configuration, then we end up aliasing
URLs when populating remotes. One place where this happens is in
`alias_all_urls()`, where we loop through all remotes and then alias
each of their URLs. The actual aliasing logic is then contained in
`alias_url()`, which returns an allocated string that contains the new
URL. This URL replaces the old URL that we have in the strvec that
contains all remote URLs.

We replace the remote URLs via `strvec_replace()`, which does not hand
over ownership of the new string to the vector. Still, we didn't free
the aliased URL and thus have a memory leak here. Fix it by freeing the
aliased string.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-08-14 10:07:56 -07:00
16d89aa975 t9001-send-email.sh: fix quoting for mailrc --dump-aliases test
The .mailrc alias file format documents that multiple addresses are
separated by spaces. The alias file used in the t9001 --dump-aliases
mailrc test have addresses which include both a name and email. These
are unquoted, so git send-email will parse this as an alias that
translates to multiple independent addresses.

The existing test does not care about this, as --dump-aliases only dumps
the alias and not the address. However, it is incorrect for a future
where --dump-aliases could also dump the mail addresses.

Fix the test to quote the aliases properly, so that they translate to a
single address.

Signed-off-by: Jacob Keller <jacob.keller@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-08-14 09:55:03 -07:00
1784522a1f midx: drop unused parameters from add_midx_to_chain()
When loading a chained midx, we build up an array of hashes, one per
layer of the chain. But since the chain is also represented by the
linked list of multi_pack_index structs, nobody actually reads this
array. We pass it to add_midx_to_chain(), but the parameters are
completely ignored.

So we can drop those unused parameters. And then we can see that its
sole caller, load_midx_chain_fd_st(), only cares about one layer hash at a
time (for parsing each line and feeding it to the single-layer midx
code). So we can replace the array with a single object_id on the stack.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-08-13 10:36:34 -07:00
96a9a3e42e bundle: default to SHA1 when reading bundle headers
We hit a segfault when trying to open a bundle via `git bundle
list-heads` when running outside of a repository. This is caused by
c8aed5e8da (repository: stop setting SHA1 as the default object hash,
2024-05-07), which stopped setting the default object hash so that
`the_hash_algo` is a `NULL` pointer when running outside of any repo.

This is only a symptom of a deeper issue though. Bundles default to the
SHA1 object format unless they advertise an "@object-format=" header.
Consequently, it has been wrong in the first place to use the object
format used by the current repository when parsing bundles. The
consequence is that trying to open a bundle that uses a different object
hash than the current repository will fail:

    $ git bundle list-heads sha1.bundle
    error: unrecognized header: ee4b540943284700a32591ad09f7e15bdeb2a10c HEAD (45)

Fix the bug by defaulting to the SHA1 object hash. We already handle the
"@object-format=" header as expected, so we don't need to adapt this
part.

Helped-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-08-13 10:26:44 -07:00
7298bcc573 builtin/bundle: have unbundle check for repo before opening its bundle
The `git bundle unbundle` subcommand requires a repository to unbundle
the contents into. As thus, the subcommand checks whether we have a
startup repository in the first place, and if not it dies.

This check happens after we have already opened the bundle though. This
causes a segfault when running outside of a repository starting with
c8aed5e8da (repository: stop setting SHA1 as the default object hash,
2024-05-07) because we have no hash function set up, but we do try to
parse refs advertised by the bundle's header.

The next commit will fix that underlying issue by defaulting to the SHA1
object format for bundles, which will also fix the described segfault here.
But as we know that we will die anyway, we can do better than that and
avoid some vain work by moving the check for a repository before we try
to open the bundle.

Reported-by: ArcticLampyrid <ArcticLampyrid@outlook.com>
Suggested-by: Jeff King <peff@peff.net>
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-08-13 10:26:20 -07:00