Commit Graph

75368 Commits

Author SHA1 Message Date
6f33d8e255 builtin: pass repository to sub commands
In 9b1cb5070f (builtin: add a repository parameter for builtin
functions, 2024-09-13) the repository was passed down to all builtin
commands. This allowed the repository to be passed down to lower layers
without depending on the global `the_repository` variable.

Continue this work by also passing down the repository parameter from
the command to sub-commands. This will help pass down the repository to
other subsystems and cleanup usage of global variables like
'the_repository' and 'the_hash_algo'.

Signed-off-by: Karthik Nayak <karthik.188@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-11-26 10:36:08 +09:00
6ea2d9d271 Sync with Git 2.47.1
* maint:
  Git 2.47.1
  Makefile(s): avoid recipe prefix in conditional statements
  doc: switch links to https
  doc: update links to current pages
2024-11-25 12:33:36 +09:00
92999a42db Git 2.47.1
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-11-25 12:32:21 +09:00
b3ba1efa50 Merge branch 'ak/typofixes' into maint-2.47
Typofixes.

* ak/typofixes:
  t: fix typos
  t/helper: fix a typo
  t/perf: fix typos
  t/unit-tests: fix typos
  contrib: fix typos
  compat: fix typos
2024-11-25 12:29:48 +09:00
00c388f487 Merge branch 'xx/protocol-v2-doc-markup-fix' into maint-2.47
Docfix.

* xx/protocol-v2-doc-markup-fix:
  Documentation/gitprotocol-v2.txt: fix a slight inconsistency in format
2024-11-25 12:29:47 +09:00
3357b3d88d Merge branch 'tc/bundle-uri-leakfix' into maint-2.47
Leakfix.

* tc/bundle-uri-leakfix:
  bundle-uri: plug leak in unbundle_from_file()
2024-11-25 12:29:46 +09:00
058c36aa26 Merge branch 'kh/checkout-ignore-other-docfix' into maint-2.47
Doc updates.

* kh/checkout-ignore-other-docfix:
  checkout: refer to other-worktree branch, not ref
2024-11-25 12:29:45 +09:00
fd78021b91 Merge branch 'kh/merge-tree-doc' into maint-2.47
Docfix.
cf. <CABPp-BE=JfoZp19Va-1oF60ADBUibGDwDkFX-Zytx7A3uJ__gg@mail.gmail.com>

* kh/merge-tree-doc:
  doc: merge-tree: improve example script
2024-11-25 12:29:44 +09:00
bd8a8a71dc Merge branch 'kn/loose-object-layer-wo-global-hash' into maint-2.47
Code clean-up.

* kn/loose-object-layer-wo-global-hash:
  loose: don't rely on repository global state
2024-11-25 12:29:43 +09:00
5f380e4017 Merge branch 'jc/doc-refspec-syntax' into maint-2.47
Doc updates.

* jc/doc-refspec-syntax:
  doc: clarify <src> in refspec syntax
2024-11-25 12:29:42 +09:00
f675674ced Merge branch 'js/doc-platform-support-link-fix' into maint-2.47
Docfix.

* js/doc-platform-support-link-fix:
  docs: fix the `maintain-git` links in `technical/platform-support`
2024-11-25 12:29:41 +09:00
e52276d340 Merge branch 'jh/config-unset-doc-fix' into maint-2.47
Docfix.

* jh/config-unset-doc-fix:
  git-config.1: remove value from positional args in unset usage
2024-11-25 12:29:40 +09:00
6b03fd8dcd Merge branch 'jk/output-prefix-cleanup' into maint-2.47
Code clean-up.

* jk/output-prefix-cleanup:
  diff: store graph prefix buf in git_graph struct
  diff: return line_prefix directly when possible
  diff: return const char from output_prefix callback
  diff: drop line_prefix_length field
  line-log: use diff_line_prefix() instead of custom helper
2024-11-25 12:29:39 +09:00
304e77d2f8 Merge branch 'sk/doc-maintenance-schedule' into maint-2.47
Doc update to clarify how periodical maintenance are scheduled,
spread across time to avoid thundering hurds.

* sk/doc-maintenance-schedule:
  doc: add a note about staggering of maintenance
2024-11-25 12:29:38 +09:00
2a18f26d77 Merge branch 'tb/notes-amlog-doc' into maint-2.47
Document "amlog" notes.

* tb/notes-amlog-doc:
  Documentation: mention the amlog in howto/maintain-git.txt
2024-11-25 12:29:37 +09:00
98c839d58f Merge branch 'master' of https://github.com/j6t/gitk into maint-2.47
* 'master' of https://github.com/j6t/gitk:
  Makefile(s): avoid recipe prefix in conditional statements
  doc: switch links to https
  doc: update links to current pages
2024-11-25 12:20:42 +09:00
c18400c6bb Makefile(s): avoid recipe prefix in conditional statements
In GNU Make commit 07fcee35 ([SV 64815] Recipe lines cannot contain
conditional statements, 2023-05-22) and following, conditional
statements may no longer be preceded by a tab character (which Make
refers to as the recipe prefix).

There are a handful of spots in our various Makefile(s) which will break
in a future release of Make containing 07fcee35. For instance, trying to
compile the pre-image of this patch with the tip of make.git results in
the following:

    $ make -v | head -1 && make
    GNU Make 4.4.90
    config.mak.uname:842: *** missing 'endif'.  Stop.

The kernel addressed this issue in 82175d1f9430 (kbuild: Replace tabs
with spaces when followed by conditionals, 2024-01-28). Address the
issues in Git's tree by applying the same strategy.

When a conditional word (ifeq, ifneq, ifdef, etc.) is preceded by one or
more tab characters, replace each tab character with 8 space characters
with the following:

    find . -type f -not -path './.git/*' -name Makefile -or -name '*.mak' |
      xargs perl -i -pe '
        s/(\t+)(ifn?eq|ifn?def|else|endif)/" " x (length($1) * 8) . $2/ge unless /\\$/
      '

The "unless /\\$/" removes any false-positives (like "\telse \"
appearing within a shell script as part of a recipe).

After doing so, Git compiles on newer versions of Make:

    $ make -v | head -1 && make
    GNU Make 4.4.90
    GIT_VERSION = 2.44.0.414.gfac1dc44ca9
    [...]

    $ echo $?
    0

Reported-by: Dario Gjorgjevski <dario.gjorgjevski@gmail.com>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Cherry-picked-from: 728b9ac0c3
Signed-off-by: Johannes Sixt <j6t@kdbg.org>
2024-11-24 13:45:49 +01:00
ed87b13a50 doc: switch links to https
These sites offer https versions of their content.
Using the https versions provides some protection for users.

Signed-off-by: Josh Soref <jsoref@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Cherry-picked-from: d05b08cd52
Signed-off-by: Johannes Sixt <j6t@kdbg.org>
2024-11-24 13:44:39 +01:00
7539e569ef doc: update links to current pages
It's somewhat traditional to respect sites' self-identification.

Signed-off-by: Josh Soref <jsoref@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Cherry-picked-from: 65175d9ea2
Signed-off-by: Johannes Sixt <j6t@kdbg.org>
2024-11-24 13:43:45 +01:00
04eaff62f2 The eleventh batch
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-11-22 14:34:19 +09:00
0a83b39594 Merge branch 'tb/multi-pack-reuse-dupfix'
Object reuse code based on multi-pack-index sent an unwanted copy
of object.

* tb/multi-pack-reuse-dupfix:
  pack-objects: only perform verbatim reuse on the preferred pack
  t5332-multi-pack-reuse.sh: demonstrate duplicate packing failure
2024-11-22 14:34:19 +09:00
76bb16db5c Merge branch 'sm/difftool'
Use of some uninitialized variables in "git difftool" has been
corrected.

* sm/difftool:
  builtin/difftool: intialize some hashmap variables
2024-11-22 14:34:18 +09:00
aa1d4b42e5 Merge branch 'jk/fetch-prefetch-double-free-fix'
Double-free fix.

* jk/fetch-prefetch-double-free-fix:
  refspec: store raw refspecs inside refspec_item
  refspec: drop separate raw_nr count
  fetch: adjust refspec->raw_nr when filtering prefetch refspecs
2024-11-22 14:34:17 +09:00
0b9b6cda6e Merge branch 'jk/test-malloc-debug-check'
Avoid build/test breakage on a system without working malloc debug
support dynamic library.

* jk/test-malloc-debug-check:
  test-lib: move malloc-debug setup after $PATH setup
  test-lib: check malloc debug LD_PRELOAD before using
2024-11-22 14:34:16 +09:00
4083a6f052 Sync with 'maint' 2024-11-20 14:47:56 +09:00
44ac252971 The tenth batch
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-11-20 14:47:17 +09:00
38e4df6615 Merge branch 'la/trailer-info'
Renaming a handful of variables and structure fields.

* la/trailer-info:
  trailer: spread usage of "trailer_block" language
2024-11-20 14:47:17 +09:00
ff44124044 Merge branch 'ja/git-add-doc-markup'
Documentation mark-up updates.

* ja/git-add-doc-markup:
  doc: git-add.txt: convert to new style convention
2024-11-20 14:47:17 +09:00
0c11ef1356 Merge branch 'jt/repack-local-promisor'
"git gc" discards any objects that are outside promisor packs that
are referred to by an object in a promisor pack, and we do not
refetch them from the promisor at runtime, resulting an unusable
repository.  Work it around by including these objects in the
referring promisor pack at the receiving end of the fetch.

* jt/repack-local-promisor:
  index-pack: repack local links into promisor packs
  t5300: move --window clamp test next to unclamped
  t0410: use from-scratch server
  t0410: make test description clearer
2024-11-20 14:47:16 +09:00
f1a384425d Prepare for 2.47.1
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-11-20 14:43:30 +09:00
cc53ddf7f0 Merge branch 'db/submodule-fetch-with-remote-name-fix' into maint-2.47
A "git fetch" from the superproject going down to a submodule used
a wrong remote when the default remote names are set differently
between them.

* db/submodule-fetch-with-remote-name-fix:
  submodule: correct remote name with fetch
2024-11-20 14:43:00 +09:00
257f2de964 Merge branch 'ps/cache-tree-w-broken-index-entry' into maint-2.47
Fail gracefully instead of crashing when attempting to write the
contents of a corrupt in-core index as a tree object.

* ps/cache-tree-w-broken-index-entry:
  unpack-trees: detect mismatching number of cache-tree/index entries
  cache-tree: detect mismatching number of index entries
  cache-tree: refactor verification to return error codes
2024-11-20 14:42:59 +09:00
76c1953395 Merge branch 'ps/maintenance-start-crash-fix' into maint-2.47
"git maintenance start" crashed due to an uninitialized variable
reference, which has been corrected.

* ps/maintenance-start-crash-fix:
  builtin/gc: fix crash when running `git maintenance start`
2024-11-20 14:42:58 +09:00
f1a50f12b9 Merge branch 'jk/fsmonitor-event-listener-race-fix' into maint-2.47
On macOS, fsmonitor can fall into a race condition that results in
a client waiting forever to be notified for an event that have
already happened.  This problem has been corrected.

* jk/fsmonitor-event-listener-race-fix:
  fsmonitor: initialize fs event listener before accepting clients
  simple-ipc: split async server initialization and running
2024-11-20 14:42:57 +09:00
3117dd359a Merge branch 'ds/line-log-asan-fix' into maint-2.47
Use after free and double freeing at the end in "git log -L... -p"
had been identified and fixed.

* ds/line-log-asan-fix:
  line-log: protect inner strbuf from free
2024-11-20 14:42:56 +09:00
090d24e9af Clean up RelNotes for 2.48
There somehow ended up too many bogus "merge X later to maint"
comments for topics that cannot be merged ever down to 'maint'
because they were forked from more recent integration branches
in the draft release notes.  Remove them, as they are inviting
for mistakes later.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-11-16 02:27:40 +09:00
e199290592 pack-objects: only perform verbatim reuse on the preferred pack
When reusing objects from source pack(s), write_reused_pack_verbatim()
is responsible for reusing objects whole eword_t's at a time. It works
by taking the longest continuous run of objects from the beginning of
each source pack that the caller wants, and reuses the entirety of that
section from each pack.

This is based on the assumption that we don't have any gaps within the
region. This assumption relieves us from having to patch any
OFS_DELTAs, since we know that there aren't any gaps between any delta
and its base in that region.

To illustrate why this assumption is necessary, suppose we have some
pack P, which has objects X, Y, and Z. If the MIDX's copy of Y was
selected from a pack other than P, then the bit corresponding to object
Y will appear earlier in the bitmap than the bits corresponding to X and
Z.

If pack-objects already has or will use the copy of Y from the pack it
was selected from in the MIDX, then it is an error to reuse all objects
between X and Z in the source pack. Doing so will cause us to reuse Y
from a different pack than the one which represents Y in the MIDX,
causing us to either:

 - include the object twice, assuming that the caller wants Y in the
   pack, or

 - include the object once, resulting in us packing more objects than
   necessary.

This regression comes from ca0fd69e37 (pack-objects: prepare
`write_reused_pack_verbatim()` for multi-pack reuse, 2023-12-14), which
incorrectly assumed that there would be no gaps in reusable regions of
non-preferred packs.

Instead, we can only safely perform the whole-word reuse optimization on
the preferred pack, where we know with certainty that no gaps exist in
that region of the bitmap. We can still reuse objects from non-preferred
packs, but we have to inspect them individually in write_reused_pack()
to ensure that any gaps that may exist are accounted for.

This allows us to simplify the implementation of
write_reused_pack_verbatim() back to almost its pre-multi-pack reuse
form, since we can now assume that the beginning of the pack appears at
the beginning of the bitmap, meaning that we don't have to account for
any bits up to the first word boundary (like we had to special case in
ca0fd69e37).

The only significant changes from the pre-ca0fd69e37 implementation are:

 - that we can no longer inspect words up to the end of
   reuse_packfile_bitmap->word_alloc, since we only want to look at
   words whose bits all correspond to objects in the given packfile, and

 - that we return early when given a reuse_packfile which is not
   preferred, making the call a noop.

In the future, it might be possible to restore this optimization if we
could guarantee that some reuse packs don't contain any gaps by
construction (similar to the "disjoint packs" idea in very early
versions of multi-pack reuse).

Helped-by: Jeff King <peff@peff.net>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-11-15 09:13:31 +09:00
57f35cfd7c t5332-multi-pack-reuse.sh: demonstrate duplicate packing failure
In the multi-pack reuse code, there are two paths for reusing the
on-disk representation of an object, handled by:

  - builtin/pack-objects.c::write_reused_pack_one()
  - builtin/pack-objects.c::write_reused_pack_verbatim()

The former is responsible for copying the bytes for a single object out
of an existing source pack. The latter does the same but for a region of
objects aligned at eword_t boundaries.

Demonstrate a bug whereby write_reused_pack_verbatim() can be tricked
into writing out objects from some source pack, even when those objects
were selected from a different source pack in the MIDX bitmap.

When the caller wants at least one of the objects in that region,
pack-objects will write the same object twice as a result of this bug.
In the other case where the caller doesn't want any of the objects in
the region of interest, we will write out objects that weren't
requested.

Demonstrate this bug by creating two packs, where the preferred one of
those packs contains a single object which also appears in the main
(non-preferred) pack. A separate bug[^1] prevents us from triggering the
main bug when the duplicated object is the last one in the main pack,
but any earlier object will suffice.

We could fix that separate bug, but the following commit will simplify
write_reused_pack_verbatim() and only call it on the preferred pack, so
doing so would have little point.

[^1]: Because write_reused_pack_verbatim() only reuses bits in the range

    off_t pack_start_off = pack_pos_to_offset(reuse_packfile->p, 0);
    off_t pack_end_off = pack_pos_to_offset(reuse_packfile->p,
                                            pos - reuse_packfile->bitmap_pos);

    written += pos - reuse_packfile->bitmap_pos;

    /* We're recording one chunk, not one object. */
    record_reused_object(pack_start_off,
                         pack_start_off - (hashfile_total(out) - pack_start));

  , or in other words excluding the object beginning at position 'pos -
  reuse_packfile->bitmap_pos' in the source pack. But since
  reuse_packfile->bitmap_pos is '1' in the non-preferred pack
  (accounting for the single-object pack which is preferred), we don't
  actually copy the bytes from the last object.

Helped-by: Jeff King <peff@peff.net>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-11-15 09:13:31 +09:00
72ad6dc368 test-lib: move malloc-debug setup after $PATH setup
Originally, the conditional definition of the setup/teardown functions
for malloc checking could be run at any time, because they depended only
on command-line options and the system getconf function.

But since 02d900361c (test-lib: check malloc debug LD_PRELOAD before
using, 2024-11-11), we probe the system by running "git version". Since
this code runs before we've set $PATH to point to the version of Git we
intend to test, we actually run the system version of git.

This mostly works, since what we really care about is whether the
LD_PRELOAD works, and it should work the same with any program. But
there are some corner cases:

  1. You might not have a system git at all, in which case the preload
     will appear to fail, even though it could work with the actual
     built version of git.

  2. Your system git could be linked in a different way. For example, if
     it was built statically, then it will ignore LD_PRELOAD entirely,
     and we might assume that the preload works, even though it might
     not when used with a dynamic build.

We could give a more complete path to the version of Git we intend to
test, but features like GIT_TEST_INSTALLED make that not entirely
trivial. So instead, let's just bump the setup until after we've set up
the $PATH. There's no need for us to do it early, as long as it is done
before the first test runs.

Reported-by: Toon Claes <toon@iotcl.com>
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-11-14 12:19:26 +09:00
25b0f41288 The ninth batch
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-11-13 08:35:34 +09:00
183ea3eabf Merge branch 'ps/mingw-rename'
The MinGW compatibility layer has been taught to support POSIX
semantics for atomic renames when other process(es) have a file
opened at the destination path.

* ps/mingw-rename:
  compat/mingw: support POSIX semantics for atomic renames
  compat/mingw: allow deletion of most opened files
  compat/mingw: share file handles created via `CreateFileW()`
2024-11-13 08:35:34 +09:00
486c9d3995 Merge branch 'jt/commit-graph-missing'
A regression where commit objects missing from a commit-graph can
cause an infinite loop when doing a fetch in a partial clone has
been fixed.

* jt/commit-graph-missing:
  fetch-pack: die if in commit graph but not obj db
  Revert "fetch-pack: add a deref_without_lazy_fetch_extended()"
2024-11-13 08:35:33 +09:00
51ba601160 Merge branch 'en/shallow-exclude-takes-a-ref-fix'
The "--shallow-exclude=<ref>" option to various history transfer
commands takes a ref, not an arbitrary revision.

* en/shallow-exclude-takes-a-ref-fix:
  doc: correct misleading descriptions for --shallow-exclude
  upload-pack: fix ambiguous error message
2024-11-13 08:35:32 +09:00
110c8fe8f5 Merge branch 'ak/t1016-style'
Test modernization.

* ak/t1016-style:
  t1016: clean up style
2024-11-13 08:35:32 +09:00
6890c99e38 Merge branch 'ps/leakfixes-part-9'
More leakfixes.

* ps/leakfixes-part-9: (22 commits)
  list-objects-filter-options: work around reported leak on error
  builtin/merge: release output buffer after performing merge
  dir: fix leak when parsing "status.showUntrackedFiles"
  t/helper: fix leaking buffer in "dump-untracked-cache"
  t/helper: stop re-initialization of `the_repository`
  sparse-index: correctly free EWAH contents
  dir: release untracked cache data
  combine-diff: fix leaking lost lines
  builtin/tag: fix leaking key ID on failure to sign
  transport-helper: fix leaking import/export marks
  builtin/commit: fix leaking cleanup config
  trailer: fix leaking strbufs when formatting trailers
  trailer: fix leaking trailer values
  builtin/commit: fix leaking change data contents
  upload-pack: fix leaking URI protocols
  pretty: clear signature check
  diff-lib: fix leaking diffopts in `do_diff_cache()`
  revision: fix leaking bloom filters
  builtin/grep: fix leak with `--max-count=0`
  grep: fix leak in `grep_splice_or()`
  ...
2024-11-13 08:35:31 +09:00
98e4015593 builtin/difftool: intialize some hashmap variables
When running a dir-diff command that produces no diff, variables
`wt_modified` and `tmp_modified` are used while uninitialized, causing:

    $ /home/smarchi/src/git/git-difftool --dir-diff master
    free(): invalid pointer
    [1]    334004 IOT instruction (core dumped)  /home/smarchi/src/git/git-difftool --dir-diff master
    $ valgrind --track-origins=yes /home/smarchi/src/git/git-difftool --dir-diff master
    ...
    Invalid free() / delete / delete[] / realloc()
       at 0x48478EF: free (vg_replace_malloc.c:989)
       by 0x422CAC: hashmap_clear_ (hashmap.c:208)
       by 0x283830: run_dir_diff (difftool.c:667)
       by 0x284103: cmd_difftool (difftool.c:801)
       by 0x238E0F: run_builtin (git.c:484)
       by 0x2392B9: handle_builtin (git.c:750)
       by 0x2399BC: cmd_main (git.c:921)
       by 0x356FEF: main (common-main.c:64)
     Address 0x1ffefff180 is on thread 1's stack
     in frame #2, created by run_dir_diff (difftool.c:358)
    ...

If taking any `goto finish` path before these variables are initialized,
`hashmap_clear_and_free()` operates on uninitialized data, sometimes
causing a crash.

This regression was introduced in 7f795a1715 (builtin/difftool: plug
several trivial memory leaks, 2024-09-26).

Fix it by initializing those variables with the `HASHMAP_INIT` macro.

Add a test comparing the main branch to itself, resulting in no diff.

Signed-off-by: Simon Marchi <simon.marchi@efficios.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-11-13 08:11:19 +09:00
fe17a25905 refspec: store raw refspecs inside refspec_item
The refspec struct keeps two matched arrays: one for the refspec_item
structs and one for the original raw refspec strings. The main reason
for this is that there are other users of refspec_item that do not care
about the raw strings. But it does make managing the refspec struct
awkward, as we must keep the two arrays in sync. This has led to bugs in
the past (both leaks and double-frees).

Let's just store a copy of the raw refspec string directly in each
refspec_item struct. This simplifies the handling at a small cost:

  1. Direct callers of refspec_item_init() will now get an extra copy of
     the refspec string, even if they don't need it. This should be
     negligible, as the struct is already allocating two strings for the
     parsed src/dst values (and we tend to only do it sparingly anyway
     for things like the TAG_REFSPEC literal).

  2. Users of refspec_appendf() will now generate a temporary string,
     copy it, and then free the result (versus handing off ownership of
     the temporary string). We could get around this by having a "nodup"
     variant of refspec_item_init(), but it doesn't seem worth the extra
     complexity for something that is not remotely a hot code path.

Code which accesses refspec->raw now needs to look at refspec->item.raw.
Other callers which just use refspec_item directly can remain the same.
We'll free the allocated string in refspec_item_clear(), which they
should be calling anyway to free src/dst.

One subtle note: refspec_item_init() can return an error, in which case
we'll still have set its "raw" field. But that is also true of the "src"
and "dst" fields, so any caller which does not _clear() the failed item
is already potentially leaking. In practice most code just calls die()
on an error anyway, but you can see the exception in valid_fetch_refspec(),
which does correctly call _clear() even on error.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-11-12 18:16:48 +09:00
d36af33081 refspec: drop separate raw_nr count
A refspec struct contains zero or more refspec_item structs, along with
matching "raw" strings. The items and raw strings are kept in separate
arrays, but those arrays will always have the same length (because we
write them only via refspec_append_nodup(), which grows both). This can
lead to bugs when manipulating the array, since the arrays and lengths
must be modified in lockstep. For example, the bug fixed in the previous
commit, which forgot to decrement raw_nr.

So let's get rid of "raw_nr" and have only "nr", making this kind of bug
impossible (and also making it clear that the two are always matched,
something that existing code already assumed but was not guaranteed by
the interface).

Even though we'd expect "alloc" and "raw_alloc" to likewise move in
lockstep, we still need to keep separate counts there if we want to
continue to use ALLOC_GROW() for both.

Conceptually this would all be simpler if refspec_item just held onto
its own raw string, and we had a single array. But there are callers
which use refspec_item outside of "struct refspec" (and so don't hold on
to a matching "raw" string at all), which we'd possibly need to adjust.
So let's not worry about refactoring that for now, and just get rid of
the redundant count variable. That is the first step on the road to
combining them anyway.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-11-12 18:16:48 +09:00
b970509c59 fetch: adjust refspec->raw_nr when filtering prefetch refspecs
In filter_prefetch_refspecs(), we may remove one or more refspecs if
they point into refs/tags/. When we do, we remove the item from the
refspec->items array, shifting subsequent items down, and then decrement
the refspec->nr count.

We also remove the item from the refspec->raw array, but fail to
decrement refspec->raw_nr. This leaves us with a count that is too high,
and anybody looking at the "raw" array will erroneously see either:

  1. The removed entry, if there were no subsequent items to shift down.

  2. A duplicate of the final entry, as everything is shifted down but
     there was nothing to overwrite the final item.

The obvious culprit to run into this is calling refspec_clear(), which
will try to free the removed entry (case 1) or double-free the final
entry (case 2). But even though the bug has existed since the function
was added in 2e03115d0c (fetch: add --prefetch option, 2021-04-16), we
did not trigger it in the test suite. The --prefetch option is normally
only used with configured refspecs, and we never bother to call
refspec_clear() on those (they are stored as part of a struct remote,
which is held in a global variable).

But you could trigger case 2 manually like:

  git fetch --prefetch . refs/tags/foo refs/tags/bar

Ironically you couldn't trigger case 1, because the code accidentally
leaked the string in the raw array, and the two bugs (the leak and the
double-free) cancelled out. But when we fixed the leak in ea4780307c
(fetch: free "raw" string when shrinking refspec, 2024-09-24), it became
possible to trigger that, too, with a single item:

  git fetch --prefetch . refs/tags/foo

We can fix both cases by just correctly decrementing "raw_nr" when we
shrink the array. Even though we don't expect people to use --prefetch
with command-line refspecs, we'll add a test to make sure it behaves
well (like the test just before it, we're just confirming that the
filtered prefetch succeeds at all).

Reported-by: Eric Mills <ermills@epic.com>
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-11-12 18:16:47 +09:00
c08589efdc index-pack: repack local links into promisor packs
Teach index-pack to, when processing the objects in a pack with
--promisor specified on the CLI, repack local objects (and the local
objects that they refer to, recursively) referenced by these objects
into promisor packs.

This prevents the situation in which, when fetching from a promisor
remote, we end up with promisor objects (newly fetched) referring
to non-promisor objects (locally created prior to the fetch). This
situation may arise if the client had previously pushed objects to the
remote, for example. One issue that arises in this situation is that,
if the non-promisor objects become inaccessible except through promisor
objects (for example, if the branch pointing to them has moved to
point to the promisor object that refers to them), then GC will garbage
collect them. There are other ways to solve this, but the simplest
seems to be to enforce the invariant that we don't have promisor objects
referring to non-promisor objects.

This repacking is done from index-pack to minimize the performance
impact. During a fetch, the only time most objects are fully inflated
in memory is when their object ID is computed, so we also scan the
objects (to see which objects they refer to) during this time.

Also to minimize the performance impact, an object is calculated to be
local if it's a loose object or present in a non-promisor pack. (If it's
also in a promisor pack or referred to by an object in a promisor pack,
it is technically already a promisor object. But a misidentification
of a promisor object as a non-promisor object is relatively benign
here - we will thus repack that promisor object into a promisor pack,
duplicating it in the object store, but there is no correctness issue,
just an issue of inefficiency.)

Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-11-12 10:18:16 +09:00