Commit Graph

113 Commits

Author SHA1 Message Date
4216329457 Merge branch 'ps/no-writable-strings'
Building with "-Werror -Wwrite-strings" is now supported.

* ps/no-writable-strings: (27 commits)
  config.mak.dev: enable `-Wwrite-strings` warning
  builtin/merge: always store allocated strings in `pull_twohead`
  builtin/rebase: always store allocated string in `options.strategy`
  builtin/rebase: do not assign default backend to non-constant field
  imap-send: fix leaking memory in `imap_server_conf`
  imap-send: drop global `imap_server_conf` variable
  mailmap: always store allocated strings in mailmap blob
  revision: always store allocated strings in output encoding
  remote-curl: avoid assigning string constant to non-const variable
  send-pack: always allocate receive status
  parse-options: cast long name for OPTION_ALIAS
  http: do not assign string constant to non-const field
  compat/win32: fix const-correctness with string constants
  pretty: add casts for decoration option pointers
  object-file: make `buf` parameter of `index_mem()` a constant
  object-file: mark cached object buffers as const
  ident: add casts for fallback name and GECOS
  entry: refactor how we remove items for delayed checkouts
  line-log: always allocate the output prefix
  line-log: stop assigning string constant to file parent buffer
  ...
2024-06-17 15:55:58 -07:00
40a163f217 Merge branch 'ps/ref-storage-migration'
A new command has been added to migrate a repository that uses the
files backend for its ref storage to use the reftable backend, with
limitations.

* ps/ref-storage-migration:
  builtin/refs: new command to migrate ref storage formats
  refs: implement logic to migrate between ref storage formats
  refs: implement removal of ref storages
  worktree: don't store main worktree twice
  reftable: inline `merged_table_release()`
  refs/files: fix NULL pointer deref when releasing ref store
  refs/files: extract function to iterate through root refs
  refs/files: refactor `add_pseudoref_and_head_entries()`
  refs: allow to skip creation of reflog entries
  refs: pass storage format to `ref_store_init()` explicitly
  refs: convert ref storage format to an enum
  setup: unset ref storage when reinitializing repository version
2024-06-17 15:55:55 -07:00
e7da938570 global: introduce USE_THE_REPOSITORY_VARIABLE macro
Use of the `the_repository` variable is deprecated nowadays, and we
slowly but steadily convert the codebase to not use it anymore. Instead,
callers should be passing down the repository to work on via parameters.

It is hard though to prove that a given code unit does not use this
variable anymore. The most trivial case, merely demonstrating that there
is no direct use of `the_repository`, is already a bit of a pain during
code reviews as the reviewer needs to manually verify claims made by the
patch author. The bigger problem though is that we have many interfaces
that implicitly rely on `the_repository`.

Introduce a new `USE_THE_REPOSITORY_VARIABLE` macro that allows code
units to opt into usage of `the_repository`. The intent of this macro is
to demonstrate that a certain code unit does not use this variable
anymore, and to keep it from new dependencies on it in future changes,
be it explicit or implicit

For now, the macro only guards `the_repository` itself as well as
`the_hash_algo`. There are many more known interfaces where we have an
implicit dependency on `the_repository`, but those are not guarded at
the current point in time. Over time though, we should start to add
guards as required (or even better, just remove them).

Define the macro as required in our code units. As expected, most of our
code still relies on the global variable. Nearly all of our builtins
rely on the variable as there is no way yet to pass `the_repository` to
their entry point. For now, declare the macro in "biultin.h" to keep the
required changes at least a little bit more contained.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-06-14 10:26:33 -07:00
9da95bda74 hash: require hash algorithm in oidread() and oidclr()
Both `oidread()` and `oidclr()` use `the_repository` to derive the hash
function that shall be used. Require callers to pass in the hash
algorithm to get rid of this implicit dependency.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-06-14 10:26:32 -07:00
092b33da2b Merge branch 'ps/ref-storage-migration' into ps/use-the-repository
* ps/ref-storage-migration:
  builtin/refs: new command to migrate ref storage formats
  refs: implement logic to migrate between ref storage formats
  refs: implement removal of ref storages
  worktree: don't store main worktree twice
  reftable: inline `merged_table_release()`
  refs/files: fix NULL pointer deref when releasing ref store
  refs/files: extract function to iterate through root refs
  refs/files: refactor `add_pseudoref_and_head_entries()`
  refs: allow to skip creation of reflog entries
  refs: pass storage format to `ref_store_init()` explicitly
  refs: convert ref storage format to an enum
  setup: unset ref storage when reinitializing repository version
2024-06-13 09:39:08 -07:00
23c32511b3 refs/reftable: stop micro-optimizing refname allocations on copy
When copying refs, we execute `write_copy_table()` to write the new
table. As the names are given to us via `arg->newname` and
`arg->oldname`, respectively, we optimize away some allocations by
assigning those fields to the reftable records we are about to write
directly, without duplicating them. This requires us to cast the input
to `char *` pointers as they are in fact constant strings. Later on, we
then unset the refname for all of the records before calling
`reftable_log_record_release()` on them.

We also do this when assigning the "HEAD" constant, but here we do not
cast because its type is `char[]` by default. It's about to be turned
into `const char *` though once we enable `-Wwrite-strings` and will
thus cause another warning.

It's quite dubious whether this micro-optimization really helps. We're
about to write to disk anyway, which is going to be way slower than a
small handful of allocations. Let's drop the optimization altogther and
instead copy arguments to simplify the code and avoid the future warning
with `-Wwrite-strings`.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-06-07 10:30:48 -07:00
f1dcdd6deb reftable: pick either 'oid' or 'target' for new updates
When creating a reference transaction update, we can provide the old/new
oid/target for the update. We have checks in place to ensure that for
each old/new, either oid or target is set and not both.

In the reftable backend, when dealing with updates without the
`REF_NO_DEREF` flag, we don't selectively propagate data as needed.
Since there are no active users of the path, this is not caught. As we
want to introduce the 'symref-update' command in the upcoming commit,
which would use this flow, correct it.

Helped-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Karthik Nayak <karthik.188@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-06-07 10:25:45 -07:00
aa6e99f122 refs: specify error for regular refs with old_target
When a reference update tries to update a symref, but the ref in
question is actually a regular ref, we raise an error. However the error
raised in this situation is:

  verifying symref target: '<ref>': reference is missing but expected <old-target>

which is very generic and doesn't indicate the mismatch of types. Let's
make this error more specific:

  cannot lock ref '<ref>': expected symref with target '<old-target>': but is a regular ref

so that users have a clearer understanding.

Signed-off-by: Karthik Nayak <karthik.188@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-06-07 10:25:44 -07:00
aba381c090 refs: create and use ref_update_expects_existing_old_ref()
The files and reftable backend, need to check if a ref must exist, so
that the required validation can be done. A ref must exist only when the
`old_oid` value of the update has been explicitly set and it is not the
`null_oid` value.

Since we also support symrefs now, we need to ensure that even when
`old_target` is set a ref must exist. While this was missed when we
added symref support in transactions, there are no active users of this
path. As we introduce the 'symref-verify' command in the upcoming
commits, it is important to fix this.

So let's export this to a function called
`ref_update_expects_existing_old_ref()` and expose it internally via
'refs-internal.h'.

Helped-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Karthik Nayak <karthik.188@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-06-07 10:25:44 -07:00
64a6dd8ffc refs: implement removal of ref storages
We're about to introduce logic to migrate ref storages. One part of the
migration will be to delete the files that are part of the old ref
storage format. We don't yet have a way to delete such data generically
across ref backends though.

Implement a new `delete` callback and expose it via a new
`ref_storage_delete()` function.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-06-06 09:04:33 -07:00
fbd1a693c7 refs: allow to skip creation of reflog entries
The ref backends do not have any way to disable the creation of reflog
entries. This will be required for upcoming ref format migration logic
so that we do not create any entries that didn't exist in the original
ref database.

Provide a new `REF_SKIP_CREATE_REFLOG` flag that allows the caller to
disable reflog entry creation.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-06-06 09:04:31 -07:00
988499e295 Merge branch 'ps/refs-without-the-repository-updates'
Further clean-up the refs subsystem to stop relying on
the_repository, and instead use the repository associated to the
ref_store object.

* ps/refs-without-the-repository-updates:
  refs/packed: remove references to `the_hash_algo`
  refs/files: remove references to `the_hash_algo`
  refs/files: use correct repository
  refs: remove `dwim_log()`
  refs: drop `git_default_branch_name()`
  refs: pass repo when peeling objects
  refs: move object peeling into "object.c"
  refs: pass ref store when detecting dangling symrefs
  refs: convert iteration over replace refs to accept ref store
  refs: retrieve worktree ref stores via associated repository
  refs: refactor `resolve_gitlink_ref()` to accept a repository
  refs: pass repo when retrieving submodule ref store
  refs: track ref stores via strmap
  refs: implement releasing ref storages
  refs: rename `init_db` callback to avoid confusion
  refs: adjust names for `init` and `init_db` callbacks
2024-05-30 14:15:13 -07:00
67ce50ba26 Merge branch 'ps/reftable-reusable-iterator'
Code clean-up to make the reftable iterator closer to be reusable.

* ps/reftable-reusable-iterator:
  reftable/merged: adapt interface to allow reuse of iterators
  reftable/stack: provide convenience functions to create iterators
  reftable/reader: adapt interface to allow reuse of iterators
  reftable/generic: adapt interface to allow reuse of iterators
  reftable/generic: move seeking of records into the iterator
  reftable/merged: simplify indices for subiterators
  reftable/merged: split up initialization and seeking of records
  reftable/reader: set up the reader when initializing table iterator
  reftable/reader: inline `reader_seek_internal()`
  reftable/reader: separate concerns of table iter and reftable reader
  reftable/reader: unify indexed and linear seeking
  reftable/reader: avoid copying index iterator
  reftable/block: use `size_t` to track restart point index
2024-05-30 14:15:12 -07:00
23528d352a Merge branch 'ps/reftable-write-options'
The knobs to tweak how reftable files are written have been made
available as configuration variables.

* ps/reftable-write-options:
  refs/reftable: allow configuring geometric factor
  reftable: make the compaction factor configurable
  refs/reftable: allow disabling writing the object index
  refs/reftable: allow configuring restart interval
  reftable: use `uint16_t` to track restart interval
  refs/reftable: allow configuring block size
  reftable/dump: support dumping a table's block structure
  reftable/writer: improve error when passed an invalid block size
  reftable/writer: drop static variable used to initialize strbuf
  reftable: pass opts as constant pointer
  reftable: consistently refer to `reftable_write_options` as `opts`
2024-05-30 14:15:11 -07:00
16a592f132 Merge branch 'ps/pseudo-ref-terminology'
Terminology to call various ref-like things are getting
straightened out.

* ps/pseudo-ref-terminology:
  refs: refuse to write pseudorefs
  ref-filter: properly distinuish pseudo and root refs
  refs: pseudorefs are no refs
  refs: classify HEAD as a root ref
  refs: do not check ref existence in `is_root_ref()`
  refs: rename `is_special_ref()` to `is_pseudo_ref()`
  refs: rename `is_pseudoref()` to `is_root_ref()`
  Documentation/glossary: define root refs as refs
  Documentation/glossary: clarify limitations of pseudorefs
  Documentation/glossary: redefine pseudorefs as special refs
2024-05-28 11:17:06 -07:00
939d49e9bd Merge branch 'kn/ref-transaction-symref' into kn/update-ref-symref
* kn/ref-transaction-symref:
  refs: remove `create_symref` and associated dead code
  refs: rename `refs_create_symref()` to `refs_update_symref()`
  refs: use transaction in `refs_create_symref()`
  refs: add support for transactional symref updates
  refs: move `original_update_refname` to 'refs.c'
  refs: support symrefs in 'reference-transaction' hook
  files-backend: extract out `create_symref_lock()`
  refs: accept symref values in `ref_transaction_update()`
2024-05-23 09:38:59 -07:00
0ff6d23a0f Merge branch 'ps/pseudo-ref-terminology' into ps/ref-storage-migration
* ps/pseudo-ref-terminology:
  refs: refuse to write pseudorefs
  ref-filter: properly distinuish pseudo and root refs
  refs: pseudorefs are no refs
  refs: classify HEAD as a root ref
  refs: do not check ref existence in `is_root_ref()`
  refs: rename `is_special_ref()` to `is_pseudo_ref()`
  refs: rename `is_pseudoref()` to `is_root_ref()`
  Documentation/glossary: define root refs as refs
  Documentation/glossary: clarify limitations of pseudorefs
  Documentation/glossary: redefine pseudorefs as special refs
2024-05-23 09:14:32 -07:00
e55f364398 Merge branch 'ps/refs-without-the-repository-updates' into ps/ref-storage-migration
* ps/refs-without-the-repository-updates:
  refs/packed: remove references to `the_hash_algo`
  refs/files: remove references to `the_hash_algo`
  refs/files: use correct repository
  refs: remove `dwim_log()`
  refs: drop `git_default_branch_name()`
  refs: pass repo when peeling objects
  refs: move object peeling into "object.c"
  refs: pass ref store when detecting dangling symrefs
  refs: convert iteration over replace refs to accept ref store
  refs: retrieve worktree ref stores via associated repository
  refs: refactor `resolve_gitlink_ref()` to accept a repository
  refs: pass repo when retrieving submodule ref store
  refs: track ref stores via strmap
  refs: implement releasing ref storages
  refs: rename `init_db` callback to avoid confusion
  refs: adjust names for `init` and `init_db` callbacks
2024-05-23 09:14:08 -07:00
4beb7a3b06 Merge branch 'kn/ref-transaction-symref'
Updates to symbolic refs can now be made as a part of ref
transaction.

* kn/ref-transaction-symref:
  refs: remove `create_symref` and associated dead code
  refs: rename `refs_create_symref()` to `refs_update_symref()`
  refs: use transaction in `refs_create_symref()`
  refs: add support for transactional symref updates
  refs: move `original_update_refname` to 'refs.c'
  refs: support symrefs in 'reference-transaction' hook
  files-backend: extract out `create_symref_lock()`
  refs: accept symref values in `ref_transaction_update()`
2024-05-20 11:20:04 -07:00
30aaff437f refs: pass repo when peeling objects
Both `peel_object()` and `peel_iterated_oid()` implicitly rely on
`the_repository` to look up objects. Despite the fact that we want to
get rid of `the_repository`, it also leads to some restrictions in our
ref iterators when trying to retrieve the peeled value for a repository
other than `the_repository`.

Refactor these functions such that both take a repository as argument
and remove the now-unnecessary restrictions.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-05-17 10:33:39 -07:00
71c871b48d refs: implement releasing ref storages
Ref storages are typically only initialized once for `the_repository`
and then never released. Until now we got away with that without causing
memory leaks because `the_repository` stays reachable, and because the
ref backend is reachable via `the_repository` its memory basically never
leaks.

This is about to change though because of the upcoming migration logic,
which will create a secondary ref storage. In that case, we will either
have to release the old or new ref storage to avoid leaks.

Implement a new `release` callback and expose it via a new
`ref_storage_release()` function.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-05-17 10:33:37 -07:00
ed93ea1602 refs: rename init_db callback to avoid confusion
Reference backends have two callbacks `init` and `init_db`. The
similarity of these two callbacks has repeatedly confused me whenever I
was looking at them, where I always had to look up which of them does
what.

Rename the `init_db` callback to `create_on_disk`, which should
hopefully be clearer.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-05-17 10:33:36 -07:00
31951c2248 refs: classify HEAD as a root ref
Root refs are those refs that live in the root of the ref hierarchy.
Our old and venerable "HEAD" reference falls into this category, but we
don't yet classify it as such in `is_root_ref()`.

Adapt the function to also treat "HEAD" as a root ref. This change is
safe to do for all current callers:

  - `ref_kind_from_refname()` already handles "HEAD" explicitly before
    calling `is_root_ref()`.

  - The "files" and "reftable" backends explicitly call both
    `is_root_ref()` and `is_headref()` together.

This also aligns behaviour or `is_root_ref()` and `is_headref()` such
that we stop checking for ref existence. This changes semantics for our
backends:

  - In the reftable backend we already know that the ref must exist
    because `is_headref()` is called as part of the ref iterator. The
    existence check is thus redundant, and the change is safe to do.

  - In the files backend we use it when populating root refs, where we
    would skip adding the "HEAD" file if it was not possible to resolve
    it. The new behaviour is to instead mark "HEAD" as broken, which
    will cause us to emit warnings in various places.

As there are no callers of `is_headref()` left afer the refactoring, we
can absorb it completely into `is_root_ref()`.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-05-15 07:30:52 -07:00
afcd067dad refs: do not check ref existence in is_root_ref()
Before this patch series, root refs except for "HEAD" and our special
refs were classified as pseudorefs. Furthermore, our terminology
clarified that pseudorefs must not be symbolic refs. This restriction
is enforced in `is_root_ref()`, which explicitly checks that a supposed
root ref resolves to an object ID without recursing.

This has been extremely confusing right from the start because (in old
terminology) a ref name may sometimes be a pseudoref and sometimes not
depending on whether it is a symbolic or regular ref. This behaviour
does not seem reasonable at all and I very much doubt that it results in
anything sane.

Last but not least, the current behaviour can actually lead to a
segfault when calling `is_root_ref()` with a reference that either does
not exist or that is a symbolic ref because we never initialized `oid`,
but then read it via `is_null_oid()`.

We have now changed terminology to clarify that pseudorefs are really
only "MERGE_HEAD" and "FETCH_HEAD", whereas all the other refs that live
in the root of the ref hierarchy are just plain refs. Thus, we do not
need to check whether the ref is symbolic or not. In fact, we can now
avoid looking up the ref completely as the name is sufficient for us to
figure out whether something would be a root ref or not.

This change of course changes semantics for our callers. As there are
only three of them we can assess each of them individually:

  - "ref-filter.c:ref_kind_from_refname()" uses it to classify refs.
    It's clear that the intent is to classify based on the ref name,
    only.

  - "refs/reftable_backend.c:reftable_ref_iterator_advance()" uses it to
    filter root refs. Again, using existence checks is pointless here as
    the iterator has just surfaced the ref, so we know it does exist.

  - "refs/files_backend.c:add_pseudoref_and_head_entries()" uses it to
    determine whether it should add a ref to the root directory of its
    iterator. This had the effect that we skipped over any files that
    are either a symbolic ref, or which are not a ref at all.

    The new behaviour is to include symbolic refs know, which aligns us
    with the adapted terminology. Furthermore, files which look like
    root refs but aren't are now mark those as "broken". As broken refs
    are not surfaced by our tooling, this should not lead to a change in
    user-visible behaviour, but may cause us to emit warnings. This
    feels like the right thing to do as we would otherwise just silently
    ignore corrupted root refs completely.

So in all cases the existence check was either superfluous, not in line
with the adapted terminology or masked potential issues. This commit
thus changes the behaviour as proposed and drops the existence check
altogether.

Add a test that verifies that this does not change user-visible
behaviour. Namely, we still don't want to show broken refs to the user
by default in git-for-each-ref(1). What this does allow though is for
internal callers to surface dangling root refs when they pass in the
`DO_FOR_EACH_INCLUDE_BROKEN` flag.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-05-15 07:30:52 -07:00
f6936e62a5 refs: rename is_pseudoref() to is_root_ref()
Rename `is_pseudoref()` to `is_root_ref()` to adapt to the newly defined
terminology in our gitglossary(7).

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-05-15 07:30:51 -07:00
08efe69212 reftable/stack: provide convenience functions to create iterators
There exist a bunch of call sites in the reftable backend that want to
create iterators for a reftable stack. This is rather convoluted right
now, where you always have to go via the merged table. And it is about
to become even more convoluted when we split up iterator initialization
and seeking in the next commit.

Introduce convenience functions that allow the caller to create an
iterator from a reftable stack directly without going through the merged
table. Adapt callers accordingly.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-05-13 17:04:19 -07:00
f518d91a2b refs/reftable: allow configuring geometric factor
Allow configuring the geometric factor used by the auto-compaction
algorithm whenever a new table is appended to the stack of tables.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-05-13 17:02:39 -07:00
afbdbfae0b refs/reftable: allow disabling writing the object index
Besides the expected "ref" and "log" records, the reftable library also
writes "obj" records. These are basically a reverse mapping of object
IDs to their respective ref records so that it becomes efficient to
figure out which references point to a specific object. The motivation
for this data structure is the "uploadpack.allowTipSHA1InWant" config,
which allows a client to fetch any object by its hash that has a ref
pointing to it.

This reverse index is not used by Git at all though, and the expectation
is that most hosters nowadays use "uploadpack.allowAnySHA1InWant". It
may thus be preferable for many users to disable writing these optional
object indices altogether to safe some precious disk space.

Add a new config "reftable.indexObjects" that allows the user to disable
the object index altogether.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-05-13 17:02:38 -07:00
90db611c2a refs/reftable: allow configuring restart interval
Add a new option `reftable.restartInterval` that allows the user to
control the restart interval when writing reftable records used by the
reftable library.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-05-13 17:02:38 -07:00
831b366c24 refs/reftable: allow configuring block size
Add a new option `reftable.blockSize` that allows the user to control
the block size used by the reftable library.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-05-13 17:02:38 -07:00
799237852b reftable: pass opts as constant pointer
We sometimes pass the refatble write options as value and sometimes as a
pointer. This is quite confusing and makes the reader wonder whether the
options get modified sometimes.

In fact, `reftable_new_writer()` does cause the caller-provided options
to get updated when some values aren't set up. This is quite unexpected,
but didn't cause any harm until now.

Adapt the code so that we do not modify the caller-provided values
anymore. While at it, refactor the code to code to consistently pass the
options as a constant pointer to clarify that the caller-provided opts
will not ever get modified.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-05-13 17:02:38 -07:00
5aec7231c8 Merge branch 'ps/reftable-write-optim'
Code to write out reftable has seen some optimization and
simplification.

* ps/reftable-write-optim:
  reftable/block: reuse compressed array
  reftable/block: reuse zstream when writing log blocks
  reftable/writer: reset `last_key` instead of releasing it
  reftable/writer: unify releasing memory
  reftable/writer: refactorings for `writer_flush_nonempty_block()`
  reftable/writer: refactorings for `writer_add_record()`
  refs/reftable: don't recompute committer ident
  reftable: remove name checks
  refs/reftable: skip duplicate name checks
  refs/reftable: perform explicit D/F check when writing symrefs
  refs/reftable: fix D/F conflict error message on ref copy
2024-05-08 10:18:43 -07:00
4865707bda refs: remove create_symref and associated dead code
In the previous commits, we converted `refs_create_symref()` to utilize
transactions to perform symref updates. Earlier `refs_create_symref()`
used `create_symref()` to do the same.

We can now remove `create_symref()` and any code associated with it
which is no longer used. We remove `create_symref()` code from all the
reference backends and also remove it entirely from the `ref_storage_be`
struct.

Signed-off-by: Karthik Nayak <karthik.188@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-05-07 08:51:50 -07:00
644daf7785 refs: add support for transactional symref updates
The reference backends currently support transactional reference
updates. While this is exposed to users via 'git-update-ref' and its
'--stdin' mode, it is also used internally within various commands.

However, we do not support transactional updates of symrefs. This commit
adds support for symrefs in both the 'files' and the 'reftable' backend.

Here, we add and use `ref_update_has_null_new_value()`, a helper
function which is used to check if there is a new_value in a reference
update. The new value could either be a symref target `new_target` or a
OID `new_oid`.

We also add another common function `ref_update_check_old_target` which
will be used to check if the update's old_target corresponds to a
reference's current target.

Now transactional updates (verify, create, delete, update) can be used
for:
- regular refs
- symbolic refs
- conversion of regular to symbolic refs and vice versa

This also allows us to expose this to users via new commands in
'git-update-ref' in the future.

Note that a dangling symref update does not record a new reflog entry,
which is unchanged before and after this commit.

Signed-off-by: Karthik Nayak <karthik.188@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-05-07 08:51:49 -07:00
e9965ba477 refs: move original_update_refname to 'refs.c'
The files backend and the reftable backend implement
`original_update_refname` to obtain the original refname of the update.
Move it out to 'refs.c' and only expose it internally to the refs
library. This will be used in an upcoming commit to also introduce
another common functionality for the two backends.

We also rename the function to `ref_update_original_update_refname` to
keep it consistent with the upcoming other 'ref_update_*' functions
that'll be introduced.

Signed-off-by: Karthik Nayak <karthik.188@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-05-07 08:51:49 -07:00
1bc4cc3fc4 refs: accept symref values in ref_transaction_update()
The function `ref_transaction_update()` obtains ref information and
flags to create a `ref_update` and add them to the transaction at hand.

To extend symref support in transactions, we need to also accept the
old and new ref targets and process it. This commit adds the required
parameters to the function and modifies all call sites.

The two parameters added are `new_target` and `old_target`. The
`new_target` is used to denote what the reference should point to when
the transaction is applied. Some functions allow this parameter to be
NULL, meaning that the reference is not changed.

The `old_target` denotes the value the reference must have before the
update. Some functions allow this parameter to be NULL, meaning that the
old value of the reference is not checked.

We also update the internal function `ref_transaction_add_update()`
similarly to take the two new parameters.

Signed-off-by: Karthik Nayak <karthik.188@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-05-07 08:51:49 -07:00
eacfd581d2 Merge branch 'ps/pack-refs-auto'
"git pack-refs" learned the "--auto" option, which is a useful
addition to be triggered from "git gc --auto".

Acked-by: Karthik Nayak <karthik.188@gmail.com>
cf. <CAOLa=ZRAEA7rSUoYL0h-2qfEELdbPHbeGpgBJRqesyhHi9Q6WQ@mail.gmail.com>

* ps/pack-refs-auto:
  builtin/gc: pack refs when using `git maintenance run --auto`
  builtin/gc: forward git-gc(1)'s `--auto` flag when packing refs
  t6500: extract objects with "17" prefix
  builtin/gc: move `struct maintenance_run_opts`
  builtin/pack-refs: introduce new "--auto" flag
  builtin/pack-refs: release allocated memory
  refs/reftable: expose auto compaction via new flag
  refs: remove `PACK_REFS_ALL` flag
  refs: move `struct pack_refs_opts` to where it's used
  t/helper: drop pack-refs wrapper
  refs/reftable: print errors on compaction failure
  reftable/stack: gracefully handle failed auto-compaction due to locks
  reftable/stack: use error codes when locking fails during compaction
  reftable/error: discern locked/outdated errors
  reftable/stack: fix error handling in `reftable_stack_init_addition()`
2024-04-09 14:31:45 -07:00
44afd85fbd refs/reftable: don't recompute committer ident
In order to write reflog entries we need to compute the committer's
identity as it gets encoded in the log record itself. The reftable
backend does this via `git_committer_info()` and `split_ident_line()` in
`fill_reftable_log_record()`, which use the Git config as well as
environment variables to figure out the identity.

While most callers would only call `fill_reftable_log_record()` once or
twice, `write_transaction_table()` will call it as many times as there
are queued ref updates. This can be quite a waste of effort when writing
many refs with reflog entries in a single transaction.

Refactor the code to pre-compute the committer information. This results
in a small speedup when writing 100000 refs in a single transaction:

  Benchmark 1: update-ref: create many refs (HEAD~)
    Time (mean ± σ):      2.895 s ±  0.020 s    [User: 1.516 s, System: 1.374 s]
    Range (min … max):    2.868 s …  2.983 s    100 runs

  Benchmark 2: update-ref: create many refs (HEAD)
    Time (mean ± σ):      2.845 s ±  0.017 s    [User: 1.461 s, System: 1.379 s]
    Range (min … max):    2.803 s …  2.913 s    100 runs

  Summary
    update-ref: create many refs (HEAD) ran
      1.02 ± 0.01 times faster than update-ref: create many refs (HEAD~)

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-04-08 17:01:41 -07:00
485c63cf5c reftable: remove name checks
In the preceding commit we have disabled name checks in the "reftable"
backend. These checks were responsible for verifying multiple things
when writing records to the reftable stack:

  - Detecting file/directory conflicts. Starting with the preceding
    commits this is now handled by the reftable backend itself via
    `refs_verify_refname_available()`.

  - Validating refnames. This is handled by `check_refname_format()` in
    the generic ref transacton layer.

The code in the reftable library is thus not used anymore and likely to
bitrot over time. Remove it.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-04-08 17:01:41 -07:00
4af31dc84a refs/reftable: skip duplicate name checks
All the callback functions which write refs in the reftable backend
perform D/F conflict checks via `refs_verify_refname_available()`. But
in reality we perform these D/F conflict checks a second time in the
reftable library via `stack_check_addition()`.

Interestingly, the code in the reftable library is inferior compared to
the generic function:

  - It is slower than `refs_verify_refname_available()`, even though
    this can probably be optimized.

  - It does not provide a proper error message to the caller, and thus
    all the user would see is a generic "file/directory conflict"
    message.

Disable the D/F conflict checks in the reftable library by setting the
`skip_name_check` write option. This results in a non-negligible speedup
when writing many refs. The following benchmark writes 100k refs in a
single transaction:

  Benchmark 1: update-ref: create many refs (HEAD~)
    Time (mean ± σ):      3.241 s ±  0.040 s    [User: 1.854 s, System: 1.381 s]
    Range (min … max):    3.185 s …  3.454 s    100 runs

  Benchmark 2: update-ref: create many refs (HEAD)
    Time (mean ± σ):      2.878 s ±  0.024 s    [User: 1.506 s, System: 1.367 s]
    Range (min … max):    2.838 s …  2.960 s    100 runs

  Summary
    update-ref: create many refs (HEAD~) ran
      1.13 ± 0.02 times faster than update-ref: create many refs (HEAD)

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-04-08 16:59:02 -07:00
455d61b6d2 refs/reftable: perform explicit D/F check when writing symrefs
We already perform explicit D/F checks in all reftable callbacks which
write refs, except when writing symrefs. For one this leads to an error
message which isn't perfectly actionable because we only tell the user
that there was a D/F conflict, but not which refs conflicted with each
other. But second, once all ref updating callbacks explicitly check for
D/F conflicts, we can disable the D/F checks in the reftable library
itself and thus avoid some duplicated efforts.

Refactor the code that writes symref tables to explicitly call into
`refs_verify_refname_available()` when writing symrefs.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-04-08 16:59:01 -07:00
f57cc987a9 refs/reftable: fix D/F conflict error message on ref copy
The `write_copy_table()` function is shared between the reftable
implementations for renaming and copying refs. The only difference
between those two cases is that the rename will also delete the old
reference, whereas copying won't.

This has resulted in a bug though where we don't properly verify refname
availability. When calling `refs_verify_refname_available()`, we always
add the old ref name to the list of refs to be skipped when computing
availability, which indicates that the name would be available even if
it already exists at the current point in time. This is only the right
thing to do for renames though, not for copies.

The consequence of this bug is quite harmless because the reftable
backend has its own checks for D/F conflicts further down in the call
stack, and thus we refuse the update regardless of the bug. But all the
user gets in this case is an uninformative message that copying the ref
has failed, without any further details.

Fix the bug and only add the old name to the skip-list in case we rename
the ref. Consequently, this error case will now be handled by
`refs_verify_refname_available()`, which knows to provide a proper error
message.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-04-08 16:59:01 -07:00
7c8eb5928f reftable/stack: add env to disable autocompaction
In future tests it will be neccesary to create repositories with a set
number of tables. To make this easier, introduce the
`GIT_TEST_REFTABLE_AUTOCOMPACTION` environment variable that, when set
to false, disables autocompaction of reftables.

Signed-off-by: Justin Tobler <jltobler@gmail.com>
Acked-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-04-08 12:11:10 -07:00
7424fb7797 Merge branch 'ps/pack-refs-auto' into jt/reftable-geometric-compaction
* ps/pack-refs-auto:
  builtin/gc: pack refs when using `git maintenance run --auto`
  builtin/gc: forward git-gc(1)'s `--auto` flag when packing refs
  t6500: extract objects with "17" prefix
  builtin/gc: move `struct maintenance_run_opts`
  builtin/pack-refs: introduce new "--auto" flag
  builtin/pack-refs: release allocated memory
  refs/reftable: expose auto compaction via new flag
  refs: remove `PACK_REFS_ALL` flag
  refs: move `struct pack_refs_opts` to where it's used
  t/helper: drop pack-refs wrapper
  refs/reftable: print errors on compaction failure
  reftable/stack: gracefully handle failed auto-compaction due to locks
  reftable/stack: use error codes when locking fails during compaction
  reftable/error: discern locked/outdated errors
  reftable/stack: fix error handling in `reftable_stack_init_addition()`
2024-04-05 10:34:23 -07:00
f89356db4a refs/reftable: expose auto compaction via new flag
Under normal circumstances, the "reftable" backend will automatically
perform compaction after appending to the stack. It is thus not
necessary and may even be considered wasteful to run git-pack-refs(1) in
"reftable"-backed repositories as it will cause the backend to compact
all tables into a single one. We do exactly that though when running
`git maintenance run --auto` or `git gc --auto`, which gets spawned by
Git after running some specific commands.

The `--auto` mode is typically only executing optimizations as needed.
To do so, we already use several heuristics for the various different
data structures in Git to determine whether to optimize them or not.
We do not use any heuristics for refs though and instead always optimize
them.

Introduce a new `PACK_REFS_AUTO` flag that can be passed to the backend.
When not handled by the backend we will continue to behave the exact
same as we do right now, that is we optimize refs unconditionally. This
is done for the "files" backend for now to retain current behaviour,
even though we may eventually also want to introduce heuristics here.
For the "reftable" backend though we already do have auto-compaction, so
we can easily reuse that logic to implement the new auto-packing flag.

Note that under normal circumstances, this should always end up being a
no-op. After all, we already invoke the code for every single addition
to the stack. But there are special cases where it can still be helpful
to execute the auto-compaction code explicitly:

  - Concurrent writers may cause compaction to not run due to locks.

  - Callers may decide to disable compaction altogether and then pack
    refs at a later point due to various reasons.

  - Other implementations of the reftable format may do compaction
    differently or even not at all.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-03-25 09:54:07 -07:00
4ccf7060d8 refs/reftable: print errors on compaction failure
When git-pack-refs(1) fails in the reftable backend we end up printing
no error message at all, leaving the caller puzzled as to why compaction
has failed. Fix this.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-03-25 09:54:07 -07:00
e8c1cda9a9 Merge branch 'ps/reftable-reflog-iteration-perf'
The code to iterate over reflogs in the reftable has been optimized
to reduce memory allocation and deallocation.

Reviewed-by: Josh Steadmon <steadmon@google.com>
cf. <Ze9eX-aaWoVaqsPP@google.com>

* ps/reftable-reflog-iteration-perf:
  refs/reftable: track last log record name via strbuf
  reftable/record: use scratch buffer when decoding records
  reftable/record: reuse message when decoding log records
  reftable/record: reuse refnames when decoding log records
  reftable/record: avoid copying author info
  reftable/record: convert old and new object IDs to arrays
  refs/reftable: reload correct stack when creating reflog iter
2024-03-21 14:55:13 -07:00
448a74e151 Merge branch 'ps/reftable-iteration-perf-part2'
The code to iterate over refs with the reftable backend has seen
some optimization.

* ps/reftable-iteration-perf-part2:
  refs/reftable: precompute prefix length
  reftable: allow inlining of a few functions
  reftable/record: decode keys in place
  reftable/record: reuse refname when copying
  reftable/record: reuse refname when decoding
  reftable/merged: avoid duplicate pqueue emptiness check
  reftable/merged: circumvent pqueue with single subiter
  reftable/merged: handle subiter cleanup on close only
  reftable/merged: remove unnecessary null check for subiters
  reftable/merged: make subiters own their records
  reftable/merged: advance subiter on subsequent iteration
  reftable/merged: make `merged_iter` structure private
  reftable/pq: use `size_t` to track iterator index
2024-03-14 14:05:23 -07:00
963a277a52 Merge branch 'ps/reftable-repo-init-fix'
Clear the fallout from a fix for 2.44 regression.

* ps/reftable-repo-init-fix:
  t0610: remove unused variable assignment
  refs/reftable: don't fail empty transactions in repo without HEAD
2024-03-07 15:59:42 -08:00
d037212d97 Merge branch 'kn/for-all-refs'
"git for-each-ref" learned "--include-root-refs" option to show
even the stuff outside the 'refs/' hierarchy.

* kn/for-all-refs:
  for-each-ref: add new option to include root refs
  ref-filter: rename 'FILTER_REFS_ALL' to 'FILTER_REFS_REGULAR'
  refs: introduce `refs_for_each_include_root_refs()`
  refs: extract out `loose_fill_ref_dir_regular_file()`
  refs: introduce `is_pseudoref()` and `is_headref()`
2024-03-05 09:44:44 -08:00