Commit Graph

76201 Commits

Author SHA1 Message Date
241499aba0 send-email: add mailmap support via sendemail.mailmap and --mailmap
In some cases, a user may be generating a patch for an old commit which
now has an out-of-date author or other identity. For example, consider a
team member who contributes to an internal fork of an upstream project,
but leaves before this change is submitted upstream.

In this case, the team members company address may no longer be valid,
and will thus bounce when sending email.

This can be manually avoided by editing the generated patch files, or by
carefully using --suppress-<cc|to> options. This requires a lot of
manual intervention and is easy to forget.

Git has support for mapping old email addresses and names to a canonical
name and address via the .mailmap file (and its associated mailmap.file,
mailmap.blob, and log.mailmap options).

Teach git send-email to enable mailmap support for all addresses. This
ensures that addresses point to the canonical real name and email
address.

Add the sendemail.mailmap configuration option and its associated
--mailmap (and --use-mailmap for compatibility with git log) options.
For now, the default behavior is to disable the mailmap in order to
avoid any surprises or breaking any existing setups.

These options support per-identity configuration via the
sendemail.identity configuration blocks. This enables identity-specific
configuration in cases where users may not want to enable support.

In addition, support send-email specific mailmap data via
sendemail.mailmap.file, sendemail.mailmap.blob and their
identity-specific variants.

The intention of these options is to enable mapping addresses which are
no longer valid to a current project or team maintainer. Such mappings
may change the actual person being referred to, and may not make sense
in a traditional mailmap file which is intended for updating canonical
name and address for the same individual.

Signed-off-by: Jacob Keller <jacob.keller@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-08-27 14:51:29 -07:00
f54ca6ae72 check-mailmap: add options for additional mailmap sources
The git check-mailmap command reads the mailmap from either the default
.mailmap location and then from the mailmap.blob and mailmap.file
configurations.

A following change to git send-email will want to support new
configuration options based on the configured identity. The
identity-based configuration and options only make sense in the context
of git send-email.

Expose the read_mailmap_file and read_mailmap_blob functions from
mailmap.c.  Teach git check-mailmap the --mailmap-file and
--mailmap-blob options which load the additional mailmap sources.

Signed-off-by: Jacob Keller <jacob.keller@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-08-27 14:51:29 -07:00
3a27e991f2 check-mailmap: accept "user@host" contacts
git check-mailmap splits each provided contact using split_ident_line.
This function requires that the contact either be of the form "Name
<user@host>" or of the form "<user@host>". In particular, if the mail
portion of the contact is not surrounded by angle brackets,
split_ident_line will reject it.

This results in git check-mailmap rejecting attempts to translate simple
email addresses:

  $ git check-mailmap user@host
  fatal: unable to parse contact: user@host

This limits the usability of check-mailmap as it requires placing angle
brackets around plain email addresses.

In particular, attempting to use git check-mailmap to support mapping
addresses in git send-email is not straight forward. The sanitization
and validation functions in git send-email strip angle brackets from
plain email addresses. It is not trivial to add brackets prior to
invoking git check-mailmap.

Instead, modify check_mailmap() to allow such strings as contacts. In
particular, treat any line which cannot be split by split_ident_line as
a simple email address.

No attempt is made to actually parse the address line, or validate that
it is actually an email address. Implementing such validation is not
trivial. Besides, we weren't validating the address between angle
brackets before anyways.

Signed-off-by: Jacob Keller <jacob.keller@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-08-27 14:51:28 -07:00
d3e7db2b82 builtin/pack-objects.c: do not open-code MAX_PACK_OBJECT_HEADER
The function `write_reused_pack_one()` defines an header to store the
OFS_DELTA header, but uses the constant "10" instead of
"MAX_PACK_OBJECT_HEADER" (as is done elsewhere in the same patch, circa
bb514de356 (pack-objects: improve partial packfile reuse, 2019-12-18)).

Declare the `ofs_header` field to be sized according to
`MAX_PACK_OBJECT_HEADER` (which is 10, as defined in "pack.h") instead
of the constant 10.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-08-27 14:50:27 -07:00
db40e3c92b pack-bitmap.c: avoid repeated pack_pos_to_offset() during reuse
When calling `try_partial_reuse()`, the (sole) caller from the function
`reuse_partial_packfile_from_bitmap_1()` has to translate its bit
position to a pack position.

In the MIDX bitmap case, the caller translates from the bit position, to
a position in the MIDX's pseudo-pack order (with `pack_pos_to_midx()`),
then get a pack offset (with `nth_midxed_offset()`) before finally
working backwards to get the pack position in the source pack by calling
`offset_to_pack_pos()`.

In the non-MIDX bitmap case, we can use the bit position as the pack
position directly (see the comment at the beginning of the
`reuse_partial_packfile_from_bitmap_1()` function for why).

In either case, the first thing that `try_partial_reuse()` does after
being called is determine the offset of the object at the given pack
position by calling `pack_pos_to_offset()`. But we already have that
information in the MIDX case!

Avoid re-computing that information by instead passing it in. In the
MIDX case, we already have that information stored. In the non-MIDX
case, the call to `pack_pos_to_offset()` moves from the function
`try_partial_reuse()` to its caller. In total, we'll save one call to
`pack_pos_to_offset()` when processing MIDX bitmaps.

(On my machine, there is a slight speed-up on the order of ~2ms, but it
is within the margin of error over 10 runs, so I think you'd have to
have a truly gigantic repository to confidently measure any significant
improvement here).

Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-08-27 14:50:27 -07:00
125c32605a builtin/pack-objects.c: translate bit positions during pack-reuse
When reusing chunks verbatim from an existing source pack, the function
write_reused_pack() first attempts to reuse whole words (via the
function `write_reused_pack_verbatim()`), and then individual bits (via
`write_reused_pack_one()`).

In the non-MIDX case, all of this code works fine. Likewise, in the MIDX
case, processing bits individually from the first (preferred) pack works
fine. However, processing subsequent packs in the MIDX case is broken
when there are duplicate objects among the set of MIDX'd packs.

This is because we treat the individual bit positions as valid pack
positions within the source pack(s), which does not account for gaps in
the source pack, like we see when the MIDX must break ties between
duplicate objects which appear in multiple packs.

The broken code looks like:

    for (; i < reuse_packfile_bitmap->word_alloc; i++) {
            for (offset = 0; offset < BITS_IN_EWORD, offset++) {
                    /* ... */

                    write_reused_pack_one(reuse_packfile->p,
                                          pos + offset - reuse_packfile->bitmap_pos,
                                          f, pack_start, &w_curs);
            }
    }

, where the second argument is incorrect and does not account for gaps.

Instead, make sure that we translate bit positions in the MIDX's
pseudo-pack order to pack positions in the respective source packs by:

  - Translating the bit position (pseudo-pack order) to a MIDX position
    (lexical order).

  - Use the MIDX position to obtain the offset at which the given object
    occurs in the source pack.

  - Then translate that offset back into a pack relative position within
    the source pack by calling offset_to_pack_pos().

After doing this, then we can safely use the result as a pack position.
Note that when doing single-pack reuse, as well as reusing objects from
the MIDX's preferred pack, such translation is not necessary, since
either ties are broken in favor of the preferred pack, or there are no
ties to break at all (in the case of non-MIDX bitmaps).

Failing to do this can result in strange failure modes. One example that
can occur when misinterpreting bits in the above fashion is that Git
thinks it's supposed to send a delta that the caller does not want.
Under this (incorrect) assumption, we try to look up the delta's base
(so that we can patch any OFS_DELTAs if necessary). We do this using
find_reused_offset().

But if we try and call that function for an offset belonging to an
object we did not send, we'll get back garbage. This can result in us
computing a negative fixup value, which results in memory corruption
when trying to write the (patched) OFS_DELTA header.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-08-27 14:50:26 -07:00
41cd4b478f pack-bitmap: tag bitmapped packs with their corresponding MIDX
The next commit will need to use the bitmap's MIDX (if one exists) to
translate bit positions into pack-relative positions in the source pack.

Ordinarily, we'd use the "midx" field of the bitmap_index struct. But
since that struct is defined within pack-bitmap.c, and our caller is in
a separate compilation unit, we do not have access to the MIDX field.

Instead, add a "from_midx" field to the bitmapped_pack structure so that
we can use that piece of data from outside of pack-bitmap.c. The caller
that uses this new piece of information will be added in the following
commit.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-08-27 14:50:26 -07:00
bbc393a9f3 t/t5332-multi-pack-reuse.sh: verify pack generation with --strict
In our tests for multi-pack reuse, we have two helper functions:

  - test_pack_objects_reused_all(), and
  - test_pack_objects_reused()

which invoke pack-objects (either with `--all`, or the supplied tips via
stdin, respectively) and ensure that (a) the number of reused objects,
and (b) the number of packs which those objects were reused from both
match the expected values.

Both functions discard the output of pack-objects and assert only on the
contents of the trace2 stream.

However, if we store the pack and attempt to index it with `--strict`,
we find that a number of our tests are broken, indicating a bug within
multi-pack reuse.

That bug will be addressed in a subsequent commit. But let's first
harden these tests by trying to index the resulting pack, marking the
tests which fail appropriately.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-08-27 14:50:26 -07:00
1609470409 git-config.1: fix description of --regexp in synopsis
The synopsis says --regexp=<regexp> but the --regexp option is a
Boolean that says "the name given is not literal, but a pattern to
match the name".

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-08-26 11:49:37 -07:00
686e9f616f git-config.1: --get-all description update
"git config --get-all foo.bar" shows all values for the foo.bar
variable, but does not give the variable name in each output entry.
Hence it is equivalent to "git config get --all foo.bar", without
"--show-names", in the more modern syntax.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-08-26 11:49:27 -07:00
159f2d50e7 Sync with 'maint' 2024-08-26 11:38:08 -07:00
b63a92d515 The ninth batch
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-08-26 11:32:24 -07:00
27d4f4032e Merge branch 'jc/coding-style-c-operator-with-spaces'
Write down whitespacing rules around C opeators.

* jc/coding-style-c-operator-with-spaces:
  CodingGuidelines: spaces around C operators
2024-08-26 11:32:24 -07:00
3222718ad7 Merge branch 'ds/for-each-ref-is-base'
'git for-each-ref' learned a new "--format" atom to find the branch
that the history leading to a given commit "%(is-base:<commit>)" is
likely based on.

* ds/for-each-ref-is-base:
  p1500: add is-base performance tests
  for-each-ref: add 'is-base' token
  commit: add gentle reference lookup method
  commit-reach: add get_branch_base_for_tip
2024-08-26 11:32:24 -07:00
3dd2a2feca Merge branch 'jk/send-email-translate-aliases'
"git send-email" learned "--translate-aliases" option that reads
addresses from the standard input and emits the result of applying
aliases on them to the standard output.

* jk/send-email-translate-aliases:
  send-email: teach git send-email option to translate aliases
  t9001-send-email.sh: update alias list used for pine test
  t9001-send-email.sh: fix quoting for mailrc --dump-aliases test
2024-08-26 11:32:23 -07:00
2b30d66c43 Merge branch 'jk/mark-unused-parameters'
Mark unused parameters as UNUSED to squelch -Wunused warnings.

* jk/mark-unused-parameters:
  t-hashmap: stop calling setup() for t_intern() test
  scalar: mark unused parameters in dummy function
  daemon: mark unused parameters in non-posix fallbacks
  setup: mark unused parameter in config callback
  test-mergesort: mark unused parameters in trivial callback
  t-hashmap: mark unused parameters in callback function
  reftable: mark unused parameters in virtual functions
  reftable: drop obsolete test function declarations
  reftable: ignore unused argc/argv in test functions
  unit-tests: ignore unused argc/argv
  t/helper: mark more unused argv/argc arguments
  oss-fuzz: mark unused argv/argc argument
  refs: mark unused parameters in do_for_each_reflog_helper()
  refs: mark unused parameters in ref_store fsck callbacks
  update-ref: mark more unused parameters in parser callbacks
  imap-send: mark unused parameter in ssl_socket_connect() fallback
2024-08-26 11:32:23 -07:00
2ff26d2286 Merge branch 'jk/drop-unused-parameters'
Drop unused parameters from functions.

* jk/drop-unused-parameters:
  diff-lib: drop unused index argument from get_stat_data()
  ref-filter: drop unused parameters from email_atom_option_parser()
  pack-bitmap: drop unused parameters from select_pseudo_merges()
  pack-bitmap: load writer config from repository parameter
  refs: drop some unused parameters from create_symref_lock()
2024-08-26 11:32:22 -07:00
1f4d89dfce Merge branch 'tb/pseudo-merge-bitmap-fixes'
We created a useless pseudo-merge reachability bitmap that is about
0 commits, and attempted to include commits that are not in packs,
which made no sense.  These bugs have been corrected.

* tb/pseudo-merge-bitmap-fixes:
  pseudo-merge.c: ensure pseudo-merge groups are closed
  pseudo-merge.c: do not generate empty pseudo-merge commits
  t/t5333-pseudo-merge-bitmaps.sh: demonstrate empty pseudo-merge groups
  pack-bitmap-write.c: select pseudo-merges even for small bitmaps
  pack-bitmap: drop redundant args from `bitmap_writer_finish()`
  pack-bitmap: drop redundant args from `bitmap_writer_build()`
  pack-bitmap: drop redundant args from `bitmap_writer_build_type_index()`
  pack-bitmap: initialize `bitmap_writer_init()` with packing_data
2024-08-26 11:32:21 -07:00
6e6f68b59b Merge branch 'ps/maintenance-detach-fix-more'
A tests for "git maintenance" that were broken on Windows have been
corrected.

* ps/maintenance-detach-fix-more:
  builtin/maintenance: fix loose objects task emitting pack hash
  t7900: exercise detaching via trace2 regions
  t7900: fix flaky test due to leaking background job
2024-08-26 11:32:20 -07:00
1e8962ee08 Merge branch 'ps/maintenance-detach-fix'
Maintenance tasks other than "gc" now properly go background when
"git maintenance" runs them.

* ps/maintenance-detach-fix:
  run-command: fix detaching when running auto maintenance
  builtin/maintenance: add a `--detach` flag
  builtin/gc: add a `--detach` flag
  builtin/gc: stop processing log file on signal
  builtin/gc: fix leaking config values
  builtin/gc: refactor to read config into structure
  config: fix constness of out parameter for `git_config_get_expiry()`
2024-08-26 11:32:20 -07:00
6809f8ccad A bit more topics for 2.46.x maintenance track
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-08-26 11:13:19 -07:00
5072ad8260 Merge branch 'xx/diff-tree-remerge-diff-fix' into maint-2.46
"git rev-list ... | git diff-tree -p --remerge-diff --stdin" should
behave more or less like "git log -p --remerge-diff" but instead it
crashed, forgetting to prepare a temporary object store needed.

* xx/diff-tree-remerge-diff-fix:
  diff-tree: fix crash when used with --remerge-diff
2024-08-26 11:10:25 -07:00
164cffa35c Merge branch 'rs/t-example-simplify' into maint-2.46
Unit test simplification.

* rs/t-example-simplify:
  t-example-decorate: remove test messages
2024-08-26 11:10:24 -07:00
c93649f98a Merge branch 'jc/safe-directory' into maint-2.46
Follow-up on 2.45.1 regression fix.

* jc/safe-directory:
  safe.directory: setting safe.directory="." allows the "current" directory
  safe.directory: normalize the configured path
  safe.directory: normalize the checked path
  safe.directory: preliminary clean-up
2024-08-26 11:10:24 -07:00
b452be06ff Merge branch 'jc/document-use-of-local' into maint-2.46
Doc update.

* jc/document-use-of-local:
  doc: note that AT&T ksh does not work with our test suite
2024-08-26 11:10:23 -07:00
9a7bd3d0cb Merge branch 'rs/use-decimal-width' into maint-2.46
Code clean-up.

* rs/use-decimal-width:
  log-tree: use decimal_width()
2024-08-26 11:10:23 -07:00
5d0870d68c Merge branch 'ss/packed-ref-store-leakfix' into maint-2.46
Leakfix.

* ss/packed-ref-store-leakfix:
  refs/files: prevent memory leak by freeing packed_ref_store
2024-08-26 11:10:22 -07:00
24a64ea0eb Merge branch 'kl/test-fixes' into maint-2.46
A flakey test and incorrect calls to strtoX() functions have been
fixed.

* kl/test-fixes:
  t6421: fix test to work when repo dir contains d0
  set errno=0 before strtoX calls
2024-08-26 11:10:21 -07:00
710ef8a945 Merge branch 'jc/reflog-expire-lookup-commit-fix' into maint-2.46
"git reflog expire" failed to honor annotated tags when computing
reachable commits.

* jc/reflog-expire-lookup-commit-fix:
  Revert "reflog expire: don't use lookup_commit_reference_gently()"
2024-08-26 11:10:21 -07:00
7bba1bd806 Merge branch 'jr/ls-files-expand-literal-doc' into maint-2.46
Docfix.

* jr/ls-files-expand-literal-doc:
  doc: fix hex code escapes in git-ls-files
2024-08-26 11:10:20 -07:00
528a762ca6 Merge branch 'jc/leakfix-mailmap' into maint-2.46
Leakfix.

* jc/leakfix-mailmap:
  mailmap: plug memory leak in read_mailmap_blob()
2024-08-26 11:10:20 -07:00
88639e5d4c Merge branch 'jc/leakfix-hashfile' into maint-2.46
Leakfix.

* jc/leakfix-hashfile:
  csum-file: introduce discard_hashfile()
2024-08-26 11:10:19 -07:00
a5e4f53baf Merge branch 'jc/jl-git-no-advice-fix' into maint-2.46
Remove leftover debugging cruft from a test script.

* jc/jl-git-no-advice-fix:
  t0018: remove leftover debugging cruft
2024-08-26 11:10:19 -07:00
5613c83f30 Merge branch 'tb/config-fixed-value-with-valueless-true' into maint-2.46
"git config --value=foo --fixed-value section.key newvalue" barfed
when the existing value in the configuration file used the
valueless true syntax, which has been corrected.

* tb/config-fixed-value-with-valueless-true:
  config.c: avoid segfault with --fixed-value and valueless config
2024-08-26 11:10:18 -07:00
a991ffff92 Merge branch 'ps/ls-remote-out-of-repo-fix' into maint-2.46
A recent update broke "git ls-remote" used outside a repository,
which has been corrected.

* ps/ls-remote-out-of-repo-fix:
  builtin/ls-remote: fall back to SHA1 outside of a repo
2024-08-26 11:10:18 -07:00
87f8426bf7 Merge branch 'jk/osxkeychain-username-is-nul-terminated' into maint-2.46
The credential helper to talk to OSX keychain sometimes sent
garbage bytes after the username, which has been corrected.

* jk/osxkeychain-username-is-nul-terminated:
  credential/osxkeychain: respect NUL terminator in username
2024-08-26 11:10:17 -07:00
4e7aa344f2 remote: plug memory leaks at early returns
Signed-off-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-08-23 14:20:07 -07:00
6a09c36371 The eighth batch
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-08-23 09:02:36 -07:00
62c5b88157 Merge branch 'ps/stash-keep-untrack-empty-fix'
A corner case bug in "git stash" was fixed.

* ps/stash-keep-untrack-empty-fix:
  builtin/stash: fix `--keep-index --include-untracked` with empty HEAD
2024-08-23 09:02:36 -07:00
2cf9c2206c Merge branch 'ps/hash-and-ref-format-from-config'
The default object hash and ref backend format used to be settable
only with explicit command line option to "git init" and
environment variables, but now they can be configured in the user's
global and system wide configuration.

* ps/hash-and-ref-format-from-config:
  setup: make ref storage format configurable via config
  setup: make object format configurable via config
  setup: merge configuration of repository formats
  t0001: delete repositories when object format tests finish
  t0001: exercise initialization with ref formats more thoroughly
2024-08-23 09:02:36 -07:00
668843e6d8 Merge branch 'cp/unit-test-reftable-readwrite'
* cp/unit-test-reftable-readwrite:
  t-reftable-readwrite: add test for known error
  t-reftable-readwrite: use 'for' in place of infinite 'while' loops
  t-reftable-readwrite: use free_names() instead of a for loop
  t: move reftable/readwrite_test.c to the unit testing framework
2024-08-23 09:02:35 -07:00
5e56a39e6a Merge branch 'ps/config-wo-the-repository'
Use of API functions that implicitly depend on the_repository
object in the config subsystem has been rewritten to pass a
repository object through the callchain.

* ps/config-wo-the-repository:
  config: hide functions using `the_repository` by default
  global: prepare for hiding away repo-less config functions
  config: don't depend on `the_repository` with branch conditions
  config: don't have setters depend on `the_repository`
  config: pass repo to functions that rename or copy sections
  config: pass repo to `git_die_config()`
  config: pass repo to `git_config_get_expiry_in_days()`
  config: pass repo to `git_config_get_expiry()`
  config: pass repo to `git_config_get_max_percent_split_change()`
  config: pass repo to `git_config_get_split_index()`
  config: pass repo to `git_config_get_index_threads()`
  config: expose `repo_config_clear()`
  config: introduce missing setters that take repo as parameter
  path: hide functions using `the_repository` by default
  path: stop relying on `the_repository` in `worktree_git_path()`
  path: stop relying on `the_repository` when reporting garbage
  hooks: remove implicit dependency on `the_repository`
  editor: do not rely on `the_repository` for interactive edits
  path: expose `do_git_common_path()` as `repo_common_pathv()`
  path: expose `do_git_path()` as `repo_git_pathv()`
2024-08-23 09:02:34 -07:00
1b6b2bfae5 Merge branch 'ps/leakfixes-part-4'
More leak fixes.

* ps/leakfixes-part-4: (22 commits)
  builtin/diff: free symmetric diff members
  diff: free state populated via options
  builtin/log: fix leak when showing converted blob contents
  userdiff: fix leaking memory for configured diff drivers
  builtin/format-patch: fix various trivial memory leaks
  diff: fix leak when parsing invalid ignore regex option
  unpack-trees: clear index when not propagating it
  sequencer: release todo list on error paths
  merge-ort: unconditionally release attributes index
  builtin/fast-export: plug leaking tag names
  builtin/fast-export: fix leaking diff options
  builtin/fast-import: plug trivial memory leaks
  builtin/notes: fix leaking `struct notes_tree` when merging notes
  builtin/rebase: fix leaking `commit.gpgsign` value
  config: fix leaking comment character config
  submodule-config: fix leaking name entry when traversing submodules
  read-cache: fix leaking hashfile when writing index fails
  bulk-checkin: fix leaking state TODO
  object-name: fix leaking symlink paths in object context
  object-file: fix memory leak when reading corrupted headers
  ...
2024-08-23 09:02:33 -07:00
85da2a2ab6 reftable/stack: fix segfault when reload with reused readers fails
It is expected that reloading the stack fails with concurrent writers,
e.g. because a table that we just wanted to read just got compacted.
In case we decided to reuse readers this will cause a segfault though
because we unconditionally release all new readers, including the reused
ones. As those are still referenced by the current stack, the result is
that we will eventually try to dereference those already-freed readers.

Fix this bug by incrementing the refcount of reused readers temporarily.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-08-23 08:04:48 -07:00
1302ed68d4 reftable/stack: reorder swapping in the reloaded stack contents
The code flow of how we swap in the reloaded stack contents is somewhat
convoluted because we switch back and forth between swapping in
different parts of the stack.

Reorder the code to simplify it. We now first close and unlink the old
tables which do not get reused before we update the stack to point to
the new stack.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-08-23 08:04:47 -07:00
89eada4ea1 reftable/reader: keep readers alive during iteration
The lifetime of a table iterator may survive the lifetime of a reader
when the stack gets reloaded. Keep the reader from being released by
increasing its refcount while the iterator is still being used.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-08-23 08:04:47 -07:00
d857469d85 reftable/reader: introduce refcounting
It was recently reported that concurrent reads and writes may cause the
reftable backend to segfault. The root cause of this is that we do not
properly keep track of reftable readers across reloads.

Suppose that you have a reftable iterator and then decide to reload the
stack while iterating through the iterator. When the stack has been
rewritten since we have created the iterator, then we would end up
discarding a subset of readers that may still be in use by the iterator.
The consequence is that we now try to reference deallocated memory,
which of course segfaults.

One way to trigger this is in t5616, where some background maintenance
jobs have been leaking from one test into another. This leads to stack
traces like the following one:

  + git -c protocol.version=0 -C pc1 fetch --filter=blob:limit=29999 --refetch origin
  AddressSanitizer:DEADLYSIGNAL
  =================================================================
  ==657994==ERROR: AddressSanitizer: SEGV on unknown address 0x7fa0f0ec6089 (pc 0x55f23e52ddf9 bp
0x7ffe7bfa1700 sp 0x7ffe7bfa1700 T0)
  ==657994==The signal is caused by a READ memory access.
      #0 0x55f23e52ddf9 in get_var_int reftable/record.c:29
      #1 0x55f23e53295e in reftable_decode_keylen reftable/record.c:170
      #2 0x55f23e532cc0 in reftable_decode_key reftable/record.c:194
      #3 0x55f23e54e72e in block_iter_next reftable/block.c:398
      #4 0x55f23e5573dc in table_iter_next_in_block reftable/reader.c:240
      #5 0x55f23e5573dc in table_iter_next reftable/reader.c:355
      #6 0x55f23e5573dc in table_iter_next reftable/reader.c:339
      #7 0x55f23e551283 in merged_iter_advance_subiter reftable/merged.c:69
      #8 0x55f23e55169e in merged_iter_next_entry reftable/merged.c:123
      #9 0x55f23e55169e in merged_iter_next_void reftable/merged.c:172
      #10 0x55f23e537625 in reftable_iterator_next_ref reftable/generic.c:175
      #11 0x55f23e2cf9c6 in reftable_ref_iterator_advance refs/reftable-backend.c:464
      #12 0x55f23e2d996e in ref_iterator_advance refs/iterator.c:13
      #13 0x55f23e2d996e in do_for_each_ref_iterator refs/iterator.c:452
      #14 0x55f23dca6767 in get_ref_map builtin/fetch.c:623
      #15 0x55f23dca6767 in do_fetch builtin/fetch.c:1659
      #16 0x55f23dca6767 in fetch_one builtin/fetch.c:2133
      #17 0x55f23dca6767 in cmd_fetch builtin/fetch.c:2432
      #18 0x55f23dba7764 in run_builtin git.c:484
      #19 0x55f23dba7764 in handle_builtin git.c:741
      #20 0x55f23dbab61e in run_argv git.c:805
      #21 0x55f23dbab61e in cmd_main git.c:1000
      #22 0x55f23dba4781 in main common-main.c:64
      #23 0x7fa0f063fc89 in __libc_start_call_main ../sysdeps/nptl/libc_start_call_main.h:58
      #24 0x7fa0f063fd44 in __libc_start_main_impl ../csu/libc-start.c:360
      #25 0x55f23dba6ad0 in _start (git+0xadfad0) (BuildId: 803b2b7f59beb03d7849fb8294a8e2145dd4aa27)

While it is somewhat awkward that the maintenance processes survive
tests in the first place, it is totally expected that reftables should
work alright with concurrent writers. Seemingly they don't.

The only underlying resource that we need to care about in this context
is the reftable reader, which is responsible for reading a single table
from disk. These readers get discarded immediately (unless reused) when
calling `reftable_stack_reload()`, which is wrong. We can only close
them once we know that there are no iterators using them anymore.

Prepare for a fix by converting the reftable readers to be refcounted.

Reported-by: Jeff King <peff@peff.net>
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-08-23 08:04:47 -07:00
4ac2fd9b4a reftable/stack: fix broken refnames in write_n_ref_tables()
The `write_n_ref_tables()` helper function writes N references in
separate tables. We never reset the computed name of those references
though, leading us to end up with unexpected names.

Fix this by resetting the buffer.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-08-23 08:04:47 -07:00
00e130a6bb reftable/reader: inline reader_close()
Same as with the preceding commit, we also provide a `reader_close()`
function that allows the caller to close a reader without freeing it.
This is unnecessary now that all users will have an allocated version of
the reader.

Inline it into `reftable_reader_free()`.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-08-23 08:04:47 -07:00
2de3c0d345 reftable/reader: inline init_reader()
Most users use an allocated version of the `reftable_reader`, except for
some tests. We are about to convert the reader to become refcounted
though, and providing the ability to keep a reader on the stack makes
this conversion harder than necessary.

Update the tests to use `reftable_reader_new()` instead to prepare for
this change.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-08-23 08:04:46 -07:00