Commit Graph

75760 Commits

Author SHA1 Message Date
1c299d03e5 refs: introduce "initial" transaction flag
There are two different ways to commit a transaction:

  - `ref_transaction_commit()` can be used to commit a regular
    transaction and is what almost every caller wants.

  - `initial_ref_transaction_commit()` can be used when it is known that
    the ref store that the transaction is committed for is empty and
    when there are no concurrent processes. This is used when cloning a
    new repository.

Implementing this via two separate functions has a couple of downsides.
First, every reference backend needs to implement a separate callback
even in the case where they don't special-case the initial transaction.
Second, backends are basically forced to reimplement the whole logic for
how to commit the transaction like the "files" backend does, even though
backends may wish to only tweak certain behaviour of a "normal" commit.
Third, it is awkward that callers must never prepare the transaction as
this is somewhat different than how a transaction typically works.

Refactor the code such that we instead mark initial transactions via a
separate flag when starting the transaction. This addresses all of the
mentioned painpoints, where the most important part is that it will
allow backends to have way more leeway in how exactly they want to
handle the initial transaction.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-11-21 07:59:15 +09:00
83b8ed8bba refs/files: move logic to commit initial transaction
Move the logic to commit initial transactions such that we can start to
call it in `files_transaction_finish()` in a subsequent commit without
requiring a separate function declaration.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-11-21 07:59:15 +09:00
a0efef1446 refs: allow passing flags when setting up a transaction
Allow passing flags when setting up a transaction such that the
behaviour of the transaction itself can be altered. This functionality
will be used in a subsequent patch.

Adapt callers accordingly.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-11-21 07:59:14 +09:00
4083a6f052 Sync with 'maint' 2024-11-20 14:47:56 +09:00
44ac252971 The tenth batch
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-11-20 14:47:17 +09:00
38e4df6615 Merge branch 'la/trailer-info'
Renaming a handful of variables and structure fields.

* la/trailer-info:
  trailer: spread usage of "trailer_block" language
2024-11-20 14:47:17 +09:00
ff44124044 Merge branch 'ja/git-add-doc-markup'
Documentation mark-up updates.

* ja/git-add-doc-markup:
  doc: git-add.txt: convert to new style convention
2024-11-20 14:47:17 +09:00
0c11ef1356 Merge branch 'jt/repack-local-promisor'
"git gc" discards any objects that are outside promisor packs that
are referred to by an object in a promisor pack, and we do not
refetch them from the promisor at runtime, resulting an unusable
repository.  Work it around by including these objects in the
referring promisor pack at the receiving end of the fetch.

* jt/repack-local-promisor:
  index-pack: repack local links into promisor packs
  t5300: move --window clamp test next to unclamped
  t0410: use from-scratch server
  t0410: make test description clearer
2024-11-20 14:47:16 +09:00
f1a384425d Prepare for 2.47.1
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-11-20 14:43:30 +09:00
cc53ddf7f0 Merge branch 'db/submodule-fetch-with-remote-name-fix' into maint-2.47
A "git fetch" from the superproject going down to a submodule used
a wrong remote when the default remote names are set differently
between them.

* db/submodule-fetch-with-remote-name-fix:
  submodule: correct remote name with fetch
2024-11-20 14:43:00 +09:00
257f2de964 Merge branch 'ps/cache-tree-w-broken-index-entry' into maint-2.47
Fail gracefully instead of crashing when attempting to write the
contents of a corrupt in-core index as a tree object.

* ps/cache-tree-w-broken-index-entry:
  unpack-trees: detect mismatching number of cache-tree/index entries
  cache-tree: detect mismatching number of index entries
  cache-tree: refactor verification to return error codes
2024-11-20 14:42:59 +09:00
76c1953395 Merge branch 'ps/maintenance-start-crash-fix' into maint-2.47
"git maintenance start" crashed due to an uninitialized variable
reference, which has been corrected.

* ps/maintenance-start-crash-fix:
  builtin/gc: fix crash when running `git maintenance start`
2024-11-20 14:42:58 +09:00
f1a50f12b9 Merge branch 'jk/fsmonitor-event-listener-race-fix' into maint-2.47
On macOS, fsmonitor can fall into a race condition that results in
a client waiting forever to be notified for an event that have
already happened.  This problem has been corrected.

* jk/fsmonitor-event-listener-race-fix:
  fsmonitor: initialize fs event listener before accepting clients
  simple-ipc: split async server initialization and running
2024-11-20 14:42:57 +09:00
3117dd359a Merge branch 'ds/line-log-asan-fix' into maint-2.47
Use after free and double freeing at the end in "git log -L... -p"
had been identified and fixed.

* ds/line-log-asan-fix:
  line-log: protect inner strbuf from free
2024-11-20 14:42:56 +09:00
1f2be8bed6 index-pack: teach --promisor to forbid pack name
Currently,

 - Running "index-pack --promisor" outside a repo segfaults.
 - It may be confusing to a user that running "index-pack --promisor"
   within a repo may make changes to the repo's object DB, especially
   since the packs indexed by the index-pack invocation may not even be
   related to the repo.

As discussed in [1] and [2], teaching --promisor to forbid a packfile
name solves both these problems. This combination of arguments requires
a repo (since we are writing the resulting .pack and .idx to it) and it
is clear that the files are related to the repo.

Currently, Git uses "index-pack --promisor" only when fetching into
a repo, so it could be argued that we should teach "index-pack" a
new argument (say, "--fetching-mode") instead of tying --promisor to
a generic argument like the packfile name. However, this --promisor
feature could conceivably be used whenever we have a packfile that is
known to come from the promisor remote (whether obtained through Git's
fetch protocol or through other means) so not using a new argument seems
reasonable - one could envision a user-made script obtaining a packfile
and then running "index-pack --promisor --stdin", for example. In fact,
it might be possible to relax the restriction further (say, by also
allowing --promisor when indexing a packfile that is in the object DB),
but relaxing the restriction is backwards-compatible so we can revisit
that later.

One thing to watch out for is the possibility of a future Git feature
that indexes a pack in the context of a repo, but does not necessarily
write the resulting pack to it (and does not necessarily desire to
make any changes to the object DB). One such feature would be fetch
quarantine, which might need the repo context in order to detect
hash collisions, but would also need to ensure that the object DB
is undisturbed in case the fetch fails for whatever reason, even if
the reason occurs only after the indexing is complete. It may not be
obvious to the implementer of such a feature that "index-pack" could
sometimes write packs other than the indexed pack to the object DB,
but there are already other ways that "fetch" could write to the object
DB (in particular, packfile URIs and bundle URIs), so hopefully the
implementation of this future feature would already include a test that
the object DB be undisturbed.

This change requires the change to t5300 by 1f52cdfacb (index-pack:
document and test the --promisor option, 2022-03-09) to be undone.
(--promisor is already tested indirectly, so we don't need the explicit
test here any more.)

[1] https://lore.kernel.org/git/20241114005652.GC1140565@coredump.intra.peff.net/
[2] https://lore.kernel.org/git/20241119185345.GB15723@coredump.intra.peff.net/

Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-11-20 10:37:56 +09:00
656ca9204a builtin/gc: provide hint when maintenance hits a stale schedule lock
When running scheduled maintenance via `git maintenance start`, we
acquire a lockfile to ensure that no other scheduled maintenance task is
running in the repository concurrently. If so, we do provide an error to
the user hinting that another process seems to be running in this repo.

There are two important cases why such a lockfile may exist:

  - An actual git-maintenance(1) process is still running in this
    repository.

  - An earlier process may have crashed or was interrupted part way
    through and has left a stale lockfile behind.

In c95547a394 (builtin/gc: fix crash when running `git maintenance
start`, 2024-10-10), we have fixed an issue where git-maintenance(1)
would crash with the "start" subcommand, and the underlying bug causes
the second scenario to trigger quite often now.

Most users don't know how to get out of that situation again though.
Ideally, we'd be removing the stale lock for our users automatically.
But in the context of repository maintenance this is rather risky, as it
can easily run for hours or even days. So finding a clear point where we
know that the old process has exited is basically impossible.

We have the same issue in other subsystems, e.g. when locking refs. Our
lockfile interfaces thus provide the `unable_to_lock_message()` function
for exactly this purpose: it provides a nice hint to the user that
explains what is going on and how to get out of that situation again by
manually removing the file.

Adapt git-maintenance(1) to print a similar hint. While we could use the
above function, we can provide a bit more context as we know exactly
what kind of process would create the lockfile.

Reported-by: Miguel Rincon Barahona <mrincon@gitlab.com>
Reported-by: Kev Kloss <kkloss@gitlab.com>
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-11-20 10:26:12 +09:00
f3b2ceea39 doc: git-diff: apply format changes to config part
By the way, we also change the sentences where git-diff would refer to
itself, so that no link is created in the HTML output.

Signed-off-by: Jean-Noël Avila <jn.avila@free.fr>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-11-19 12:31:05 +09:00
0b080a70ab doc: git-diff: apply format changes to diff-generate-patch
Signed-off-by: Jean-Noël Avila <jn.avila@free.fr>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-11-19 12:31:05 +09:00
6ace09b2f9 doc: git-diff: apply format changes to diff-format
Signed-off-by: Jean-Noël Avila <jn.avila@free.fr>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-11-19 12:31:04 +09:00
6b552e39c0 doc: git-diff: apply format changes to diff-options
The format change is only applied to the sections of the file that are
filtered in git-diff.

Signed-off-by: Jean-Noël Avila <jn.avila@free.fr>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-11-19 12:31:04 +09:00
e72c2d2e91 doc: git-diff: apply new documentation guidelines
The documentation for git-diff has been updated to follow the new
documentation guidelines. The following changes have been applied to
the series of patches:

- switching the synopsis to a synopsis block which will automatically
  format placeholders in italics and keywords in monospace
- use _<placeholder>_ instead of <placeholder> in the description
- use `backticks for keywords and more complex option
descriptions`. The new rendering engine will apply synopsis rules to
these spans.
- prevent git-diff from self-referencing itself via gitlink macro when
the generated link would point to the same page.

Signed-off-by: Jean-Noël Avila <jn.avila@free.fr>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-11-19 12:31:04 +09:00
b8558e6abd Merge branch 'ps/reftable-detach' into ps/reftable-iterator-reuse
* ps/reftable-detach:
  reftable/system: provide thin wrapper for lockfile subsystem
  reftable/stack: drop only use of `get_locked_file_path()`
  reftable/system: provide thin wrapper for tempfile subsystem
  reftable/stack: stop using `fsync_component()` directly
  reftable/system: stop depending on "hash.h"
  reftable: explicitly handle hash format IDs
  reftable/system: move "dir.h" to its only user
2024-11-19 12:24:33 +09:00
988e7f5e95 reftable/system: provide thin wrapper for lockfile subsystem
We use the lockfile subsystem to write lockfiles for "tables.list". As
with the tempfile subsystem, the lockfile subsystem also hooks into our
infrastructure to prune stale locks via atexit(3p) or signal handlers.

Furthermore, the lockfile subsystem also handles locking timeouts, which
do add quite a bit of logic. Having to reimplement that in the context
of Git wouldn't make a whole lot of sense, and it is quite likely that
downstream users of the reftable library may have a better idea for how
exactly to implement timeouts.

So again, provide a thin wrapper for the lockfile subsystem instead such
that the compatibility shim is fully self-contained.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-11-19 12:23:11 +09:00
6361226b79 reftable/stack: drop only use of get_locked_file_path()
We've got a single callsite where we call `get_locked_file_path()`. As
we're about to convert our usage of the lockfile subsystem to instead be
used via a compatibility shim we'd have to implement more logic for this
single callsite. While that would be okay if Git was the only supposed
user of the reftable library, it's a bit more awkward when considering
that we have to reimplement this functionality for every user of the
library eventually.

Refactor the code such that we don't call `get_locked_file_path()`
anymore.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-11-19 12:23:10 +09:00
01e49941d6 reftable/system: provide thin wrapper for tempfile subsystem
We use the tempfile subsystem to write temporary tables, but given that
we're in the process of converting the reftable library to become
standalone we cannot use this subsystem directly anymore. While we could
in theory convert the code to use mkstemp(3p) instead, we'd lose access
to our infrastructure that automatically prunes tempfiles via atexit(3p)
or signal handlers.

Provide a thin wrapper for the tempfile subsystem instead. Like this,
the compatibility shim is fully self-contained in "reftable/system.c".
Downstream users of the reftable library would have to implement their
own tempfile shims by replacing "system.c" with a custom version.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-11-19 12:23:10 +09:00
86b770b0bb reftable/stack: stop using fsync_component() directly
We're executing `fsync_component()` directly in the reftable library so
that we can fsync data to disk depending on "core.fsync". But as we're
in the process of converting the reftable library to become standalone
we cannot use that function in the library anymore.

Refactor the code such that users of the library can inject a custom
fsync function via the write options. This allows us to get rid of the
dependency on "write-or-die.h".

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-11-19 12:23:10 +09:00
c2f08236ed reftable/system: stop depending on "hash.h"
We include "hash.h" in "reftable/system.h" such that we can use hash
format IDs as well as the raw size of SHA1 and SHA256. As we are in the
process of converting the reftable library to become standalone we of
course cannot rely on those constants anymore.

Introduce a new `enum reftable_hash` to replace internal uses of the
hash format IDs and new constants that replace internal uses of the hash
size. Adapt the reftable backend to set up the correct hash function.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-11-19 12:23:10 +09:00
88e297275b reftable: explicitly handle hash format IDs
The hash format IDs are used for two different things across the
reftable codebase:

  - They are used as a 32 bit unsigned integer when reading and writing
    the header in order to identify the hash function.

  - They are used internally to identify which hash function is in use.

When one only considers the second usecase one might think that one can
easily change the representation of those hash IDs. But because those
IDs end up in the reftable header and footer on disk it is important
that those never change.

Create separate constants `REFTABLE_FORMAT_ID_*` and use them in
contexts where we read or write reftable headers. This serves multiple
purposes:

  - It allows us to more easily discern cases where we actually use
    those constants for the on-disk format.

  - It detangles us from the same constants that are defined in
    libgit.a, which is another required step to convert the reftable
    library to become standalone.

  - It makes the next step easier where we stop using `GIT_*_FORMAT_ID`
    constants in favor of a custom enum.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-11-19 12:23:09 +09:00
17e8039878 reftable/system: move "dir.h" to its only user
We still include "dir.h" in "reftable/system.h" even though it is not
used by anything but by a single unit test. Move it over into that unit
test so that we don't accidentally use any functionality provided by it
in the reftable codebase.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-11-19 12:23:08 +09:00
5e904f1a4a fast-import: avoid making replace refs point to themselves
If someone replaces a commit with a modified version, then builds on
that commit, and then later decides to rewrite history in a format like

    git fast-export --all | CMD_TO_TWEAK_THE_STREAM | git fast-import

and CMD_TO_TWEAK_THE_STREAM undoes the modifications that the
replacement did, then at the end you'd get a replace ref that points to
itself.  For example:

    $ git show-ref | grep replace
    fb92ebc654641b310e7d0360d0a5a49316fd7264 refs/replace/fb92ebc654641b310e7d0360d0a5a49316fd7264

Git commands which pay attention to replace refs will die with an error
when a self-referencing replace ref is present:

    $ git log
    fatal: replace depth too high for object fb92ebc654641b310e7d0360d0a5a49316fd7264

Avoid such problems by deleting replace refs that will simply end up
pointing to themselves at the end of our writing.  Unless users specify
--quiet, warn them when we delete such a replace ref.

Two notes about this patch:
  * We are not ignoring the problematic update of the replace ref
    (turning it into a no-op), we are replacing the update with a delete.
    The logic here is that if the repository had a value for the replace
    ref before fast-import was run, and the replace ref was explicitly
    named in the fast-import stream, we don't want the replace ref to be
    left with a pre-fast-import value.
  * While loops with more than one element (e.g. refs/replace/A points
    to B, and refs/replace/B points to A) are possible, they seem much
    less plausible.  It is pretty easy to create a sequence of
    git-filter-repo commands that will trigger a self-referencing replace
    ref, but I do not know how to trigger a scenario with a cycle length
    greater than 1.

Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-11-19 09:39:33 +09:00
2af8ead52b object-file: inline empty tree and blob literals
We define macros with the bytes of the empty trees and blobs for sha1
and sha256. But since e1ccd7e2b1 (sha1_file: only expose empty object
constants through git_hash_algo, 2018-05-02), those are used only for
initializing the git_hash_algo entries. Any other code using the macros
directly would be suspicious, since a hash_algo pointer is the level of
indirection we use to make everything work with both sha1 and sha256.

So let's future proof against code doing the wrong thing by dropping the
macros entirely and just initializing the structs directly.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-11-18 21:48:48 +09:00
e37feea00b object-file: treat cached_object values as const
The cached-object API maps oids to in-memory entries. Once inserted,
these entries should be immutable. Let's return them from the
find_cached_object() call with a const tag to make this clear.

Suggested-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-11-18 21:48:48 +09:00
9202ffcf10 object-file: drop oid field from find_cached_object() return value
The pretend_object_file() function adds to an array mapping oids to
object contents, which are later retrieved with find_cached_object().
We naturally need to store the oid for each entry, since it's the lookup
key.

But find_cached_object() also returns a hard-coded empty_tree object.
There we don't care about its oid field and instead compare against
the_hash_algo->empty_tree. The oid field is left as all-zeroes.

This all works, but it means that the cached_object struct we return
from find_cached_object() may or may not have a valid oid field, depend
whether it is the hard-coded tree or came from pretend_object_file().

Nobody looks at the field, so there's no bug. But let's future-proof it
by returning only the object contents themselves, not the oid. We'll
continue to call this "struct cached_object", and the array entry
mapping the key to those contents will be a "cached_object_entry".

This would also let us swap out the array for a better data structure
(like a hashmap) if we chose, but there's not much point. The only code
that adds an entry is git-blame, which adds at most a single entry per
process.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-11-18 21:48:48 +09:00
b2a95dfd63 object-file: move empty_tree struct into find_cached_object()
The fake empty_tree struct is a static global, but the only code that
looks at it is find_cached_object(). The struct itself is a little odd,
with an invalid "oid" field that is handled specially by that function.

Since it's really just an implementation detail, let's move it to a
static within the function. That future-proofs against other code trying
to use it and seeing the weird oid value.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-11-18 21:48:47 +09:00
2911f9ed1e object-file: drop confusing oid initializer of empty_tree struct
We treat the empty tree specially, providing an in-memory "cached" copy,
which allows you to diff against it even if the object doesn't exist in
the repository. This is implemented as part of the larger cached_object
subsystem, but we use a stand-alone empty_tree struct.

We initialize the oid of that struct using EMPTY_TREE_SHA1_BIN_LITERAL.
At first glance, that seems like a bug; how could this ever work for
sha256 repositories?

The answer is that we never look at the oid field! The oid field is used
to look up entries added by pretend_object_file() to the cached_objects
array. But for our stand-alone entry, we look for it independently using
the_hash_algo->empty_tree, which will point to the correct algo struct
for the repository.

This happened in 62ba93eaa9 (sha1_file: convert cached object code to
struct object_id, 2018-05-02), which even mentions that this field is
never used. Let's reduce confusion for anybody reading this code by
replacing the sha1 initializer with a comment. The resulting field will
be all-zeroes, so any violation of our assumption that the oid field is
not used will break equally for sha1 and sha256.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-11-18 21:48:47 +09:00
e770f36307 object-file: prefer array-of-bytes initializer for hash literals
We hard-code a few well-known hash values for empty trees and blobs in
both sha1 and sha256 formats. We do so with string literals like this:

  #define EMPTY_TREE_SHA256_BIN_LITERAL \
         "\x6e\xf1\x9b\x41\x22\x5c\x53\x69\xf1\xc1" \
         "\x04\xd4\x5d\x8d\x85\xef\xa9\xb0\x57\xb5" \
         "\x3b\x14\xb4\xb9\xb9\x39\xdd\x74\xde\xcc" \
         "\x53\x21"

and then use it to initialize the hash field of an object_id struct.
That hash field is exactly 32 bytes long (the size we need for sha256).
But the string literal above is actually 33 bytes long due to the NUL
terminator. This is legal in C, and the NUL is ignored.

  Side note on legality: in general excess initializer elements are
  forbidden, and gcc will warn on both of these:

    char foo[3] = { 'h', 'u', 'g', 'e' };
    char bar[3] = "VeryLongString";

  I couldn't find specific language in the standard allowing
  initialization from a string literal where _just_ the NUL is ignored,
  but C99 section 6.7.8 (Initialization), paragraph 32 shows this exact
  case as "example 8".

However, the upcoming gcc 15 will start warning for this case (when
compiled with -Wextra via DEVELOPER=1):

      CC object-file.o
  object-file.c:52:9: warning: initializer-string for array of ‘unsigned char’ is too long [-Wunterminated-string-initialization]
     52 |         "\x6e\xf1\x9b\x41\x22\x5c\x53\x69\xf1\xc1" \
        |         ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
  object-file.c:79:17: note: in expansion of macro ‘EMPTY_TREE_SHA256_BIN_LITERAL’

which is understandable. Even though this is not a bug for us, since we
do not care about the NUL terminator (and are just using the literal as
a convenient format), it would be easy to accidentally create an array
that was mistakenly unterminated.

We can avoid this warning by switching the initializer to an actual
array of unsigned values. That arguably demonstrates our intent more
clearly anyway.

Reported-by: Sam James <sam@gentoo.org>
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-11-18 21:48:47 +09:00
5dac35bbde Makefile: let clar header targets depend on their scripts
The targets that generate clar headers depend on their source files, but
not on the script that is actually generating the output. Fix the issue
by adding the missing dependencies.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-11-18 09:59:26 +09:00
8caa7b9b05 cmake: use verbatim arguments when invoking clar commands
Pass the VERBATIM option to `add_custom_command()`. Like this, all
arguments to the commands will be escaped properly for the build tool so
that the invoked command receives each argument unchanged.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-11-18 09:59:26 +09:00
8839dccc8d cmake: use SH_EXE to execute clar scripts
In 30bf9f0aaa (cmake: set up proper dependencies for generated clar
headers, 2024-10-21), we have deduplicated the logic to generate our
clar headers by reusing the same scripts that our Makefile does. Despite
the deduplication, this refactoring also made us rebuild the headers in
case the source files change, which didn't happen previously.

The commit also introduced an issue though: we execute the scripts
directly, so when the host does not have "/bin/sh" available they will
fail. This is for example the case on Windows when importing the CMake
project into Microsoft Visual Studio.

Address the issue by invoking the scripts with `SH_EXE`, which contains
the discovered path of the shell interpreter.

While at it, wrap the overly long lines in the CMake build instructions.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-11-18 09:59:25 +09:00
9a91ab9400 t/unit-tests: convert "clar-generate.awk" into a shell script
Convert "clar-generate.awk" into a shell script that invokes awk(1).
This allows us to avoid the shell redirect in the build system, which
may otherwise be a problem with build systems on platforms that use a
different shell.

While at it, wrap the overly long lines in the CMake build instructions.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-11-18 09:59:25 +09:00
820fd1a569 Documentation/git-bundle.txt: discuss naïve backups
It might be naïve to think that those who need this education would end
up here in the first place.  But I think it’s good to mention this
high-level concept here on a command which provides a backup strategy.

Signed-off-by: Kristoffer Haugsbakk <code@khaugsbakk.name>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-11-18 09:56:26 +09:00
c43a67f83d Documentation/git-bundle.txt: mention --all in spec. refs
Mention `--all` as an alternative in “Specifying References”.

Signed-off-by: Kristoffer Haugsbakk <code@khaugsbakk.name>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-11-18 09:56:25 +09:00
f27b48d904 Documentation/git-bundle.txt: remove old --all example
We don’t need this part now that we have a fleshed-out `--all` example.

Signed-off-by: Kristoffer Haugsbakk <code@khaugsbakk.name>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-11-18 09:56:25 +09:00
df0cf6faad Documentation/git-bundle.txt: mention full backup example
Provide an example about how to make a “full backup” with caveats about
what that means in this case.

This is a requested use-case.[1]  But the doc is a bit unassuming
about it:

    If you want to match `git clone --mirror`, which would include your
    refs such as `refs/remotes/*`, use `--all`.

The user cannot be expected to formulate “I want a full backup” as “I
want to match `git clone --mirror`” for a bundle file or something.
Let’s drop this mention of `--all` later in the doc and frontload it.

† 1: E.g.:

    • https://stackoverflow.com/questions/5578270/fully-backup-a-git-repohttps://stackoverflow.com/questions/11792671/how-to-git-bundle-a-complete-repo

Helped-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Kristoffer Haugsbakk <code@khaugsbakk.name>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-11-18 09:56:25 +09:00
639cd8db63 reflog: rename unreachable
In C23, "unreachable" is a macro that invokes undefined behavior if it
is invoked.  To make sure that our code compiles on a variety of C
versions, rename unreachable to "is_unreachable".

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-11-18 09:42:08 +09:00
e8b3bcf491 index-pack: rename struct thread_local
"thread_local" is a keyword in C23.  To make sure that our code compiles
on a wide variety of C versions, rename struct thread_local to "struct
thread_local_data" to avoid a conflict.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-11-18 09:42:08 +09:00
68e3c69efa Documentation/glossary: describe "trailer"
Reported-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Kristoffer Haugsbakk <code@khaugsbakk.name>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-11-18 09:41:24 +09:00
090d24e9af Clean up RelNotes for 2.48
There somehow ended up too many bogus "merge X later to maint"
comments for topics that cannot be merged ever down to 'maint'
because they were forked from more recent integration branches
in the draft release notes.  Remove them, as they are inviting
for mistakes later.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-11-16 02:27:40 +09:00
0ffb5a6bf1 Allow cloning from repositories owned by another user
Historically, Git has allowed users to clone from an untrusted
repository, and we have documented that this is safe to do so:

    `upload-pack` tries to avoid any dangerous configuration options or
    hooks from the repository it's serving, making it safe to clone an
    untrusted directory and run commands on the resulting clone.

However, this was broken by f4aa8c8bb1 ("fetch/clone: detect dubious
ownership of local repositories", 2024-04-10) in an attempt to make
things more secure.  That change resulted in a variety of problems when
cloning locally and over SSH, but it did not change the stated security
boundary.  Because the security boundary has not changed, it is safe to
adjust part of the code that patch introduced.

To do that and restore the previous functionality, adjust enter_repo to
take two flags instead of one.

The two bits are

 - ENTER_REPO_STRICT: callers that require exact paths (as opposed
   to allowing known suffixes like ".git", ".git/.git" to be
   omitted) can set this bit.  Corresponds to the "strict" parameter
   that the flags word replaces.

 - ENTER_REPO_ANY_OWNER_OK: callers that are willing to run without
   ownership check can set this bit.

The former is --strict-paths option of "git daemon".  The latter is
set only by upload-pack, which honors the claimed security boundary.

Note that local clones across ownership boundaries require --no-local so
that upload-pack is used.  Document this fact in the manual page and
provide an example.

This patch was based on one written by Junio C Hamano.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-11-15 11:05:06 +09:00
e199290592 pack-objects: only perform verbatim reuse on the preferred pack
When reusing objects from source pack(s), write_reused_pack_verbatim()
is responsible for reusing objects whole eword_t's at a time. It works
by taking the longest continuous run of objects from the beginning of
each source pack that the caller wants, and reuses the entirety of that
section from each pack.

This is based on the assumption that we don't have any gaps within the
region. This assumption relieves us from having to patch any
OFS_DELTAs, since we know that there aren't any gaps between any delta
and its base in that region.

To illustrate why this assumption is necessary, suppose we have some
pack P, which has objects X, Y, and Z. If the MIDX's copy of Y was
selected from a pack other than P, then the bit corresponding to object
Y will appear earlier in the bitmap than the bits corresponding to X and
Z.

If pack-objects already has or will use the copy of Y from the pack it
was selected from in the MIDX, then it is an error to reuse all objects
between X and Z in the source pack. Doing so will cause us to reuse Y
from a different pack than the one which represents Y in the MIDX,
causing us to either:

 - include the object twice, assuming that the caller wants Y in the
   pack, or

 - include the object once, resulting in us packing more objects than
   necessary.

This regression comes from ca0fd69e37 (pack-objects: prepare
`write_reused_pack_verbatim()` for multi-pack reuse, 2023-12-14), which
incorrectly assumed that there would be no gaps in reusable regions of
non-preferred packs.

Instead, we can only safely perform the whole-word reuse optimization on
the preferred pack, where we know with certainty that no gaps exist in
that region of the bitmap. We can still reuse objects from non-preferred
packs, but we have to inspect them individually in write_reused_pack()
to ensure that any gaps that may exist are accounted for.

This allows us to simplify the implementation of
write_reused_pack_verbatim() back to almost its pre-multi-pack reuse
form, since we can now assume that the beginning of the pack appears at
the beginning of the bitmap, meaning that we don't have to account for
any bits up to the first word boundary (like we had to special case in
ca0fd69e37).

The only significant changes from the pre-ca0fd69e37 implementation are:

 - that we can no longer inspect words up to the end of
   reuse_packfile_bitmap->word_alloc, since we only want to look at
   words whose bits all correspond to objects in the given packfile, and

 - that we return early when given a reuse_packfile which is not
   preferred, making the call a noop.

In the future, it might be possible to restore this optimization if we
could guarantee that some reuse packs don't contain any gaps by
construction (similar to the "disjoint packs" idea in very early
versions of multi-pack reuse).

Helped-by: Jeff King <peff@peff.net>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-11-15 09:13:31 +09:00