Rename the `reftable_new_reader()` function to `reftable_reader_new()`
to match our coding guidelines.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The only difference between `stack_compact_range_stats()` and
`stack_compact_range()` is that the former updates stats on failure,
whereas the latter doesn't. There are no callers anymore that do not
want their stats updated though, making the indirection unnecessary.
Inline the stat updates into `stack_compact_range()`.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The `reftable_table` interface provides a generic infrastructure that
can abstract away whether the underlying table is a single table, or a
merged table. This abstraction can make it rather hard to reason about
the code. We didn't ever use it to implement the reftable backend, and
with the preceding patches in this patch series we in fact don't use it
at all anymore. Furthermore, it became somewhat useless with the recent
refactorings that made it possible to seek reftable iterators multiple
times, as these now provide generic access to tables for us. The
interface is thus redundant and only brings unnecessary complexity with
it.
Remove the `struct reftable_table` interface and its associated
functions.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Move `reftable_stack_print_directory()` into the "dump-reftable" helper.
This follows the same reasoning as the preceding commit.
Note that this requires us to remove the tests for this functionality in
`reftable/stack_test.c`. The test does not really add much anyway,
because all it verifies is that we do not crash or run into an error,
and it specifically doesn't check the outputted data. Also, as the code
is now part of the test helper, it doesn't make much sense to have a
unit test for it in the first place.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
To read a reference for the reftable stack, we first create a generic
`reftable_table` from the merged table and then read the reference via a
convenience function. We are about to remove these generic interfaces,
so let's instead open-code the logic to prepare for this removal.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The merged table provides access to a reftable stack by merging the
contents of those tables into a virtual table. These subtables are being
tracked via `struct reftable_table`, which is a generic interface for
accessing either a single reftable or a merged reftable. So in theory,
it would be possible for the merged table to merge together other merged
tables.
This is somewhat nonsensical though: we only ever set up a merged table
over normal reftables, and there is no reason to do otherwise. This
generic interface thus makes the code way harder to follow and reason
about than really necessary. The abstraction layer may also have an
impact on performance, even though the extra set of vtable function
calls probably doesn't really matter.
Refactor the merged tables to use a `struct reftable_reader` for each of
the subtables instead, which gives us direct access to the underlying
tables. Adjust names accordingly.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Rename `reftable_new_merged_table()` to `reftable_merged_table_new()`
such that the name matches our coding style.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When compacting tables, it may happen that we want to compact a set of
tables which are already locked by a concurrent process that compacts
them. In the case where we wanted to perform a full compaction of all
tables it is sensible to bail out in this case, as we cannot fulfill the
requested action.
But when performing auto-compaction it isn't necessarily in our best
interest of us to abort the whole operation. For example, due to the
geometric compacting schema that we use, it may be that process A takes
a lot of time to compact the bulk of all tables whereas process B
appends a bunch of new tables to the stack. B would in this case also
notice that it has to compact the tables that process A is compacting
already and thus also try to compact the same range, probably including
the new tables it has appended. But because those tables are locked
already, it will fail and thus abort the complete auto-compaction. The
consequence is that the stack will grow longer and longer while A isn't
yet done with compaction, which will lead to a growing performance
impact.
Instead of aborting auto-compaction altogether, let's gracefully handle
this situation by instead compacting tables which aren't locked. To do
so, instead of locking from the beginning of the slice-to-be-compacted,
we start locking tables from the end of the slice. Once we hit the first
table that is locked already, we abort. If we succeeded to lock two or
more tables, then we simply reduce the slice of tables that we're about
to compact to those which we managed to lock.
This ensures that we can at least make some progress for compaction in
said scenario. It also helps in other scenarios, like for example when a
process died and left a stale lockfile behind. In such a case we can at
least ensure some compaction on a best-effort basis.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The locking employed by compaction uses the following schema:
1. Lock "tables.list" and verify that it matches the version we have
loaded in core.
2. Lock each of the tables in the user-supplied range of tables that
we are supposed to compact. These locks prohibit any concurrent
process to compact those tables while we are doing that.
3. Unlock "tables.list". This enables concurrent processes to add new
tables to the stack, but also allows them to compact tables outside
of the range of tables that we have locked.
4. Perform the compaction.
5. Lock "tables.list" again.
6. Move the compacted table into place.
7. Write the new order of tables, including the compacted table, into
the lockfile.
8. Commit the lockfile into place.
Letting concurrent processes modify the "tables.list" file while we are
doing the compaction is very much part of the design and thus expected.
After all, it may take some time to compact tables in the case where we
are compacting a lot of very large tables.
But there is a bug in the code. Suppose we have two processes which are
compacting two slices of the table. Given that we lock each of the
tables before compacting them, we know that the slices must be disjunct
from each other. But regardless of that, compaction performed by one
process will always impact what the other process needs to write to the
"tables.list" file.
Right now, we do not check whether the "tables.list" has been changed
after we have locked it for the second time in (5). This has the
consequence that we will always commit the old, cached in-core tables to
disk without paying to respect what the other process has written. This
scenario would then lead to data loss and corruption.
This can even happen in the simpler case of one compacting process and
one writing process. The newly-appended table by the writing process
would get discarded by the compacting process because it never sees the
new table.
Fix this bug by re-checking whether our stack is still up to date after
locking for the second time. If it isn't, then we adjust the indices of
tables to replace in the updated stack.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When modifying "tables.list", we need to lock the list before updating
it to ensure that no concurrent writers modify the list at the same
point in time. While we do this via the `lock_file` subsystem when
compacting the stack, we manually handle the lock when adding a new
table to it. While not wrong, it is at least inconsistent.
Refactor the code to consistently lock "tables.list" via the `lock_file`
subsytem.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We use `fsync_component_or_die()` when committing an addition to the
"tables.list" lock file, which unsurprisingly dies in case the fsync
fails. Given that this is part of the reftable library, we should never
die and instead let callers handle the error.
Adapt accordingly and use `fsync_component()` instead.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When compacting tables, we store the locks of all tables we are about to
compact in the `table_locks` array. As we currently only ever compact
all tables in the user-provided range or none, we simply track those
locks via the indices of the respective tables in the merged stack.
This is about to change though, as we will introduce a mode where auto
compaction gracefully handles the case of already-locked files. In this
case, it may happen that we only compact a subset of the user-supplied
range of tables. In this case, the indices will not necessarily match
the lock indices anymore.
Refactor the code such that we track the number of locks via a separate
variable. The resulting code is expected to perform the same, but will
make it easier to perform the described change.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When auto-compaction fails due to a locking error, we update the
statistics to indicate this failure. We're not doing the same when
performing a full compaction.
Fix this inconsistency by using `stack_compact_range_stats()`, which
handles the stat update for us.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Refactor the function that gathers table sizes to be more idiomatic. For
one, use `REFTABLE_CALLOC_ARRAY()` instead of `reftable_calloc()`.
Second, avoid using an integer to iterate through the tables in the
reftable stack given that `stack_len` itself is using a `size_t`.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Building with "-Werror -Wwrite-strings" is now supported.
* ps/no-writable-strings: (27 commits)
config.mak.dev: enable `-Wwrite-strings` warning
builtin/merge: always store allocated strings in `pull_twohead`
builtin/rebase: always store allocated string in `options.strategy`
builtin/rebase: do not assign default backend to non-constant field
imap-send: fix leaking memory in `imap_server_conf`
imap-send: drop global `imap_server_conf` variable
mailmap: always store allocated strings in mailmap blob
revision: always store allocated strings in output encoding
remote-curl: avoid assigning string constant to non-const variable
send-pack: always allocate receive status
parse-options: cast long name for OPTION_ALIAS
http: do not assign string constant to non-const field
compat/win32: fix const-correctness with string constants
pretty: add casts for decoration option pointers
object-file: make `buf` parameter of `index_mem()` a constant
object-file: mark cached object buffers as const
ident: add casts for fallback name and GECOS
entry: refactor how we remove items for delayed checkouts
line-log: always allocate the output prefix
line-log: stop assigning string constant to file parent buffer
...
A new command has been added to migrate a repository that uses the
files backend for its ref storage to use the reftable backend, with
limitations.
* ps/ref-storage-migration:
builtin/refs: new command to migrate ref storage formats
refs: implement logic to migrate between ref storage formats
refs: implement removal of ref storages
worktree: don't store main worktree twice
reftable: inline `merged_table_release()`
refs/files: fix NULL pointer deref when releasing ref store
refs/files: extract function to iterate through root refs
refs/files: refactor `add_pseudoref_and_head_entries()`
refs: allow to skip creation of reflog entries
refs: pass storage format to `ref_store_init()` explicitly
refs: convert ref storage format to an enum
setup: unset ref storage when reinitializing repository version
We're about to enable `-Wwrite-strings`, which changes the type of
string constants to `const char[]`. Fix various sites where we assign
such constants to non-const variables.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The function `merged_table_release()` releases a merged table, whereas
`reftable_merged_table_free()` releases a merged table and then also
free's its pointer. But all callsites of `merged_table_release()` are in
fact followed by `reftable_merged_table_free()`, which is redundant.
Inline `merged_table_release()` into `reftable_merged_table_free()` to
get rid of this redundance.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Code clean-up to make the reftable iterator closer to be reusable.
* ps/reftable-reusable-iterator:
reftable/merged: adapt interface to allow reuse of iterators
reftable/stack: provide convenience functions to create iterators
reftable/reader: adapt interface to allow reuse of iterators
reftable/generic: adapt interface to allow reuse of iterators
reftable/generic: move seeking of records into the iterator
reftable/merged: simplify indices for subiterators
reftable/merged: split up initialization and seeking of records
reftable/reader: set up the reader when initializing table iterator
reftable/reader: inline `reader_seek_internal()`
reftable/reader: separate concerns of table iter and reftable reader
reftable/reader: unify indexed and linear seeking
reftable/reader: avoid copying index iterator
reftable/block: use `size_t` to track restart point index
Refactor the interfaces exposed by `struct reftable_merged_table` and
`struct merged_iter` such that they support iterator reuse. This is done
by separating initialization of the iterator and seeking on it.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
There exist a bunch of call sites in the reftable backend that want to
create iterators for a reftable stack. This is rather convoluted right
now, where you always have to go via the merged table. And it is about
to become even more convoluted when we split up iterator initialization
and seeking in the next commit.
Introduce convenience functions that allow the caller to create an
iterator from a reftable stack directly without going through the merged
table. Adapt callers accordingly.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When auto-compacting, the reftable library packs references such that
the sizes of the tables form a geometric sequence. The factor for this
geometric sequence is hardcoded to 2 right now. We're about to expose
this as a config option though, so let's expose the factor via write
options.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We sometimes pass the refatble write options as value and sometimes as a
pointer. This is quite confusing and makes the reader wonder whether the
options get modified sometimes.
In fact, `reftable_new_writer()` does cause the caller-provided options
to get updated when some values aren't set up. This is quite unexpected,
but didn't cause any harm until now.
Adapt the code so that we do not modify the caller-provided values
anymore. While at it, refactor the code to code to consistently pass the
options as a constant pointer to clarify that the caller-provided opts
will not ever get modified.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Throughout the reftable library the `reftable_write_options` are
sometimes referred to as `cfg` and sometimes as `opts`. Unify these to
consistently use `opts` to avoid confusion.
While at it, touch up the coding style a bit by removing unneeded braces
around one-line statements and newlines between variable declarations.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Code to write out reftable has seen some optimization and
simplification.
* ps/reftable-write-optim:
reftable/block: reuse compressed array
reftable/block: reuse zstream when writing log blocks
reftable/writer: reset `last_key` instead of releasing it
reftable/writer: unify releasing memory
reftable/writer: refactorings for `writer_flush_nonempty_block()`
reftable/writer: refactorings for `writer_add_record()`
refs/reftable: don't recompute committer ident
reftable: remove name checks
refs/reftable: skip duplicate name checks
refs/reftable: perform explicit D/F check when writing symrefs
refs/reftable: fix D/F conflict error message on ref copy
In the preceding commit we have disabled name checks in the "reftable"
backend. These checks were responsible for verifying multiple things
when writing records to the reftable stack:
- Detecting file/directory conflicts. Starting with the preceding
commits this is now handled by the reftable backend itself via
`refs_verify_refname_available()`.
- Validating refnames. This is handled by `check_refname_format()` in
the generic ref transacton layer.
The code in the reftable library is thus not used anymore and likely to
bitrot over time. Remove it.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
To reduce the number of on-disk reftables, compaction is performed.
Contiguous tables with the same binary log value of size are grouped
into segments. The segment that has both the lowest binary log value and
contains more than one table is set as the starting point when
identifying the compaction segment.
Since segments containing a single table are not initially considered
for compaction, if the table appended to the list does not match the
previous table log value, no compaction occurs for the new table. It is
therefore possible for unbounded growth of the table list. This can be
demonstrated by repeating the following sequence:
git branch -f foo
git branch -d foo
Each operation results in a new table being written with no compaction
occurring until a separate operation produces a table matching the
previous table log value.
Instead, to avoid unbounded growth of the table list, the compaction
strategy is updated to ensure tables follow a geometric sequence after
each operation by individually evaluating each table in reverse index
order. This strategy results in a much simpler and more robust algorithm
compared to the previous one while also maintaining a minimal ordered
set of tables on-disk.
When creating 10 thousand references, the new strategy has no
performance impact:
Benchmark 1: update-ref: create refs sequentially (revision = HEAD~)
Time (mean ± σ): 26.516 s ± 0.047 s [User: 17.864 s, System: 8.491 s]
Range (min … max): 26.447 s … 26.569 s 10 runs
Benchmark 2: update-ref: create refs sequentially (revision = HEAD)
Time (mean ± σ): 26.417 s ± 0.028 s [User: 17.738 s, System: 8.500 s]
Range (min … max): 26.366 s … 26.444 s 10 runs
Summary
update-ref: create refs sequentially (revision = HEAD) ran
1.00 ± 0.00 times faster than update-ref: create refs sequentially (revision = HEAD~)
Some tests in `t0610-reftable-basics.sh` assert the on-disk state of
tables and are therefore updated to specify the correct new table count.
Since compaction is more aggressive in ensuring tables maintain a
geometric sequence, the expected table count is reduced in these tests.
In `reftable/stack_test.c` tests related to `sizes_to_segments()` are
removed because the function is no longer needed. Also, the
`test_suggest_compaction_segment()` test is updated to better showcase
and reflect the new geometric compaction behavior.
Signed-off-by: Justin Tobler <jltobler@gmail.com>
Acked-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The reftable stack already has a variable to configure whether or not to
run auto-compaction, but it is inaccessible to users of the library.
There exist use cases where a caller may want to have more control over
auto-compaction.
Move the `disable_auto_compact` option into `reftable_write_options` to
allow external callers to disable auto-compaction. This will be used in
a subsequent commit.
Signed-off-by: Justin Tobler <jltobler@gmail.com>
Acked-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Whenever we commit a new table to the reftable stack we will end up
invoking auto-compaction of the stack to keep the total number of tables
at bay. This auto-compaction may fail though in case at least one of the
tables which we are about to compact is locked. This is indicated by the
compaction function returning `REFTABLE_LOCK_ERROR`. We do not handle
this case though, and thus bubble that return value up the calling
chain, which will ultimately cause a failure.
Fix this bug by ignoring `REFTABLE_LOCK_ERROR`.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Compaction of a reftable stack may fail gracefully when there is a
concurrent process that writes to the reftable stack and which has thus
locked either the "tables.list" file or one of the tables. This is
expected and can be handled gracefully by some of the callers which
invoke compaction. Thus, to indicate this situation to our callers, we
return a positive return code from `stack_compact_range()` and bubble it
up to the caller.
This kind of error handling is somewhat awkward though as many callers
in the call chain never even think of handling positive return values.
Thus, the result is either that such errors are swallowed by accident,
or that we abort operations with an unhelpful error message.
Make the code more robust by always using negative error codes when
compaction fails, with `REFTABLE_LOCK_ERROR` for the described benign
error case.
Note that only a single callsite knew to handle positive error codes
gracefully in the first place. Subsequent commits will touch up some of
the other sites to handle those errors better.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We currently throw two different errors into a similar-but-different
error code:
- Errors when trying to lock the reftable stack.
- Errors when trying to write to the reftable stack which has been
modified concurrently.
This results in unclear error handling and user-visible error messages.
Create a new `REFTABLE_OUTDATED_ERROR` so that those error conditions
can be clearly told apart from each other. Adjust users of the old
`REFTABLE_LOCK_ERROR` to use the new error code as required.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In `reftable_stack_init_addition()` we call `stack_uptodate()` after
having created the lockfile to check whether the stack was modified
concurrently, which is indicated by a positive return code from the
latter function. If so, we return a `REFTABLE_LOCK_ERROR` to the caller
and abort the addition.
The error handling has an off-by-one though because we check whether the
error code is `> 1` instead of `> 0`. Thus, instead of returning the
locking error, we would return a positive value. One of the callers of
`reftable_stack_init_addition()` works around this bug by repeating the
error code check without the off-by-one. But other callers are subtly
broken by this bug.
Fix this by checking for `err > 0` instead. This has the consequence
that `reftable_stack_init_addition()` won't ever return a positive error
code anymore, but will instead return `REFTABLE_LOCK_ERROR` now. Thus,
we can drop the check for a positive error code in `stack_try_add()`
now.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We do not register tables resulting from stack compaction with the
tempfile API. Those tables will thus not be deleted in case Git gets
killed.
Refactor the code to register compacted tables as tempfiles.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We do not register any of the locks we acquire when compacting the
reftable stack via our lockfiles interfaces. These locks will thus not
be released when Git gets killed.
Refactor the code to register locks as lockfiles.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We do not register new tables which we're about to add to the stack with
the tempfile API. Those tables will thus not be deleted in case Git gets
killed.
Refactor the code to register tables as tempfiles.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Code clean-up in various reftable code paths.
* ps/reftable-styles:
reftable/record: improve semantics when initializing records
reftable/merged: refactor initialization of iterators
reftable/merged: refactor seeking of records
reftable/stack: use `size_t` to track stack length
reftable/stack: use `size_t` to track stack slices during compaction
reftable/stack: index segments with `size_t`
reftable/stack: fix parameter validation when compacting range
reftable: introduce macros to allocate arrays
reftable: introduce macros to grow arrays
The write codepath for the reftable data learned to honor
core.fsync configuration.
* jc/reftable-core-fsync:
reftable/stack: fsync "tables.list" during compaction
reftable: honor core.fsync
While the stack length is already stored as `size_t`, we frequently use
`int`s to refer to those stacks throughout the reftable library. Convert
those cases to use `size_t` instead to make things consistent.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We use `int`s to track reftable slices when compacting the reftable
stack, which is considered to be a code smell in the Git project.
Convert the code to use `size_t` instead.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We use `int`s to index into arrays of segments and track the length of
them, which is considered to be a code smell in the Git project. Convert
the code to use `size_t` instead.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The `stack_compact_range()` function receives a "first" and "last" index
that indicates which tables of the reftable stack should be compacted.
Naturally, "first" must be smaller than "last" in order to identify a
proper range of tables to compress, which we indeed also assert in the
function. But the validations happens after we have already allocated
arrays with a size of `last - first + 1`, leading to an underflow and
thus an invalid allocation size.
Fix this by reordering the array allocations to happen after we have
validated parameters. While at it, convert the array allocations to use
the newly introduced macros.
Note that the relevant variables pointing into arrays should also be
converted to use `size_t` instead of `int`. This is left for a later
commit in this series.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Similar to the preceding commit, let's carry over macros to allocate
arrays with `REFTABLE_ALLOC_ARRAY()` and `REFTABLE_CALLOC_ARRAY()`. This
requires us to change the signature of `reftable_calloc()`, which only
takes a single argument right now and thus puts the burden on the caller
to calculate the final array's size. This is a net improvement though as
it means that we can now provide proper overflow checks when multiplying
the array size with the member size.
Convert callsites of `reftable_calloc()` to the new signature and start
using the new macros where possible.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Throughout the reftable library we have many cases where we need to grow
arrays. In order to avoid too many reallocations, we roughly double the
capacity of the array on each iteration. The resulting code pattern is
duplicated across many sites.
We have similar patterns in our main codebase, which is why we have
eventually introduced an `ALLOC_GROW()` macro to abstract it away and
avoid some code duplication. We cannot easily reuse this macro here
though because `ALLOC_GROW()` uses `REALLOC_ARRAY()`, which in turn will
call realloc(3P) to grow the array. The reftable code is structured as a
library though (even if the boundaries are fuzzy), and one property this
brings with it is that it is possible to plug in your own allocators. So
instead of using realloc(3P), we need to use `reftable_realloc()` that
knows to use the user-provided implementation.
So let's introduce two new macros `REFTABLE_REALLOC_ARRAY()` and
`REFTABLE_ALLOC_GROW()` that mirror what we do in our main codebase,
with two modifications:
- They use `reftable_realloc()`, as explained above.
- They use a different growth factor of `2 * cap + 1` instead of `(cap
+ 16) * 3 / 2`.
The second change is because we know a bit more about the allocation
patterns in the reftable library. In most cases, we end up only having a
handful of items in the array and don't end up growing them. The initial
capacity that our normal growth factor uses (which is 24) would thus end
up over-allocating in a lot of code paths. This effect is measurable:
- Before change:
HEAP SUMMARY:
in use at exit: 671,983 bytes in 152 blocks
total heap usage: 3,843,446 allocs, 3,843,294 frees, 223,761,402 bytes allocated
- After change with a growth factor of `(2 * alloc + 1)`:
HEAP SUMMARY:
in use at exit: 671,983 bytes in 152 blocks
total heap usage: 3,843,446 allocs, 3,843,294 frees, 223,761,410 bytes allocated
- After change with a growth factor of `(alloc + 16)* 2 / 3`:
HEAP SUMMARY:
in use at exit: 671,983 bytes in 152 blocks
total heap usage: 3,833,673 allocs, 3,833,521 frees, 4,728,251,742 bytes allocated
While the total heap usage is roughly the same, we do end up allocating
significantly more bytes with our usual growth factor (in fact, roughly
21 times as many).
Convert the reftable library to use these new macros.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In 1df18a1c9a (reftable: honor core.fsync, 2024-01-23), we have added
code to fsync both newly written reftables as well as "tables.list" to
disk. But there are two code paths where "tables.list" is being written:
- When appending a new table due to a normal ref update.
- When compacting a range of tables during compaction.
We have only addressed the former code path, but do not yet sync the new
"tables.list" file in the latter. Fix this omission.
Note that we are not yet adding any tests. These tests will be added
once the "reftable" backend has been upstreamed.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Low-level I/O optimization for reftable.
* ps/reftable-optimize-io:
reftable/stack: fix race in up-to-date check
reftable/stack: unconditionally reload stack after commit
reftable/blocksource: use mmap to read tables
reftable/blocksource: refactor code to match our coding style
reftable/stack: use stat info to avoid re-reading stack list
reftable/stack: refactor reloading to use file descriptor
reftable/stack: refactor stack reloading to have common exit path
When creating a new compacted table from a range of preexisting ones we
don't set the default permissions on the resulting table when specified
by the user. This has the effect that the "core.sharedRepository" config
will not be honored correctly.
Fix this bug and add a test to catch this issue. Note that we only test
on non-Windows platforms because Windows does not use POSIX permissions
natively.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
While the reffiles backend honors configured fsync settings, the
reftable backend does not. Address this by fsyncing reftable files using
the write-or-die api's fsync_component() in two places: when we
add additional entries into the table, and when we close the reftable
writer.
This commits adds a flush function pointer as a new member of
reftable_writer because we are not sure that the first argument to the
*write function pointer always contains a file descriptor. In the case of
strbuf_add_void, the first argument is a buffer. This way, we can pass
in a corresponding flush function that knows how to flush depending on
which writer is being used.
This patch does not contain tests as they will need to wait for another
patch to start to exercise the reftable backend. At that point, the
tests will be added to observe that fsyncs are happening when the
reftable is in use.
Signed-off-by: John Cai <johncai86@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In 6fdfaf15a0 (reftable/stack: use stat info to avoid re-reading stack
list, 2024-01-11) we have introduced a new mechanism to avoid re-reading
the table list in case stat(3P) figures out that the stack didn't change
since the last time we read it.
While this change significantly improved performance when writing many
refs, it can unfortunately lead to false negatives in very specific
scenarios. Given two processes A and B, there is a feasible sequence of
events that cause us to accidentally treat the table list as up-to-date
even though it changed:
1. A reads the reftable stack and caches its stat info.
2. B updates the stack, appending a new table to "tables.list". This
will both use a new inode and result in a different file size, thus
invalidating A's cache in theory.
3. B decides to auto-compact the stack and merges two tables. The file
size now matches what A has cached again. Furthermore, the
filesystem may decide to recycle the inode number of the file we
have replaced in (2) because it is not in use anymore.
4. A reloads the reftable stack. Neither the inode number nor the
file size changed. If the timestamps did not change either then we
think the cached copy of our stack is up-to-date.
In fact, the commit introduced three related issues:
- Non-POSIX compliant systems may not report proper `st_dev` and
`st_ino` values in stat(3P), which made us rely solely on the
file's potentially coarse-grained mtime and ctime.
- `stat_validity_check()` and friends may end up not comparing
`st_dev` and `st_ino` depending on the "core.checkstat" config,
again reducing the signal to the mtime and ctime.
- `st_ino` can be recycled, rendering the check moot even on
POSIX-compliant systems.
Given that POSIX defines that "The st_ino and st_dev fields taken
together uniquely identify the file within the system", these issues led
to the most important signal to establish file identity to be ignored or
become useless in some cases.
Refactor the code to stop using `stat_validity_check()`. Instead, we
manually stat(3P) the file descriptors to make relevant information
available. On Windows and MSYS2 the result will have both `st_dev` and
`st_ino` set to 0, which allows us to address the first issue by not
using the stat-based cache in that case. It also allows us to make sure
that we always compare `st_dev` and `st_ino`, addressing the second
issue.
The third issue of inode recycling can be addressed by keeping the file
descriptor of "files.list" open during the lifetime of the reftable
stack. As the file will still exist on disk even though it has been
unlinked it is impossible for its inode to be recycled as long as the
file descriptor is still open.
This should address the race in a POSIX-compliant way. The only real
downside is that this mechanism cannot be used on non-POSIX-compliant
systems like Windows. But we at least have the second-level caching
mechanism in place that compares contents of "files.list" with the
currently loaded list of tables.
This new mechanism performs roughly the same as the previous one that
relied on `stat_validity_check()`:
Benchmark 1: update-ref: create many refs (HEAD~)
Time (mean ± σ): 4.754 s ± 0.026 s [User: 2.204 s, System: 2.549 s]
Range (min … max): 4.694 s … 4.802 s 20 runs
Benchmark 2: update-ref: create many refs (HEAD)
Time (mean ± σ): 4.721 s ± 0.020 s [User: 2.194 s, System: 2.527 s]
Range (min … max): 4.691 s … 4.753 s 20 runs
Summary
update-ref: create many refs (HEAD~) ran
1.01 ± 0.01 times faster than update-ref: create many refs (HEAD)
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
After we have committed an addition to the reftable stack we call
`reftable_stack_reload()` to reload the stack and thus reflect the
changes that were just added. This function will only conditionally
reload the stack in case `stack_uptodate()` tells us that the stack
needs reloading. This check is wasteful though because we already know
that the stack needs reloading.
Call `reftable_stack_reload_maybe_reuse()` instead, which will
unconditionally reload the stack. This is merely a conceptual fix, the
code in question was not found to cause any problems in practice.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>