The `block_source_read_block()` function and its implementations return
an integer as a result that reflects either the number of bytes read, or
an error. As such its return type, a signed integer, isn't wrong, but it
doesn't give the reader a good hint what it actually returns.
Refactor the function to return an `ssize_t` instead, which is typical
for functions similar to read(3p) and should thus give readers a better
signal what they can expect as a result.
Adjust callers to better handle the returned value to avoid warnings
with -Wsign-compare. One of these callers is `reader_get_block()`, whose
return value is only ever used by its callers to figure out whether or
not the read was successful. So instead of bubbling up the `ssize_t`
there, too, we adapt it to only indicate success or errors.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We include "hash.h" in "reftable/system.h" such that we can use hash
format IDs as well as the raw size of SHA1 and SHA256. As we are in the
process of converting the reftable library to become standalone we of
course cannot rely on those constants anymore.
Introduce a new `enum reftable_hash` to replace internal uses of the
hash format IDs and new constants that replace internal uses of the hash
size. Adapt the reftable backend to set up the correct hash function.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The reftable library is now prepared to expect that the memory
allocation function given to it may fail to allocate and to deal
with such an error.
* ps/reftable-alloc-failures: (26 commits)
reftable/basics: fix segfault when growing `names` array fails
reftable/basics: ban standard allocator functions
reftable: introduce `REFTABLE_FREE_AND_NULL()`
reftable: fix calls to free(3P)
reftable: handle trivial allocation failures
reftable/tree: handle allocation failures
reftable/pq: handle allocation failures when adding entries
reftable/block: handle allocation failures
reftable/blocksource: handle allocation failures
reftable/iter: handle allocation failures when creating indexed table iter
reftable/stack: handle allocation failures in auto compaction
reftable/stack: handle allocation failures in `stack_compact_range()`
reftable/stack: handle allocation failures in `reftable_new_stack()`
reftable/stack: handle allocation failures on reload
reftable/reader: handle allocation failures in `reader_init_iter()`
reftable/reader: handle allocation failures for unindexed reader
reftable/merged: handle allocation failures in `merged_table_init_iter()`
reftable/writer: handle allocation failures in `reftable_new_writer()`
reftable/writer: handle allocation failures in `writer_index_hash()`
reftable/record: handle allocation failures when decoding records
...
Handle allocation failures in `reader_init_iter()`. This requires us to
also adapt `reftable_reader_init_*_iterator()` to bubble up the new
error codes. Adapt callers accordingly.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
It was recently reported that concurrent reads and writes may cause the
reftable backend to segfault. The root cause of this is that we do not
properly keep track of reftable readers across reloads.
Suppose that you have a reftable iterator and then decide to reload the
stack while iterating through the iterator. When the stack has been
rewritten since we have created the iterator, then we would end up
discarding a subset of readers that may still be in use by the iterator.
The consequence is that we now try to reference deallocated memory,
which of course segfaults.
One way to trigger this is in t5616, where some background maintenance
jobs have been leaking from one test into another. This leads to stack
traces like the following one:
+ git -c protocol.version=0 -C pc1 fetch --filter=blob:limit=29999 --refetch origin
AddressSanitizer:DEADLYSIGNAL
=================================================================
==657994==ERROR: AddressSanitizer: SEGV on unknown address 0x7fa0f0ec6089 (pc 0x55f23e52ddf9 bp
0x7ffe7bfa1700 sp 0x7ffe7bfa1700 T0)
==657994==The signal is caused by a READ memory access.
#0 0x55f23e52ddf9 in get_var_int reftable/record.c:29
#1 0x55f23e53295e in reftable_decode_keylen reftable/record.c:170
#2 0x55f23e532cc0 in reftable_decode_key reftable/record.c:194
#3 0x55f23e54e72e in block_iter_next reftable/block.c:398
#4 0x55f23e5573dc in table_iter_next_in_block reftable/reader.c:240
#5 0x55f23e5573dc in table_iter_next reftable/reader.c:355
#6 0x55f23e5573dc in table_iter_next reftable/reader.c:339
#7 0x55f23e551283 in merged_iter_advance_subiter reftable/merged.c:69
#8 0x55f23e55169e in merged_iter_next_entry reftable/merged.c:123
#9 0x55f23e55169e in merged_iter_next_void reftable/merged.c:172
#10 0x55f23e537625 in reftable_iterator_next_ref reftable/generic.c:175
#11 0x55f23e2cf9c6 in reftable_ref_iterator_advance refs/reftable-backend.c:464
#12 0x55f23e2d996e in ref_iterator_advance refs/iterator.c:13
#13 0x55f23e2d996e in do_for_each_ref_iterator refs/iterator.c:452
#14 0x55f23dca6767 in get_ref_map builtin/fetch.c:623
#15 0x55f23dca6767 in do_fetch builtin/fetch.c:1659
#16 0x55f23dca6767 in fetch_one builtin/fetch.c:2133
#17 0x55f23dca6767 in cmd_fetch builtin/fetch.c:2432
#18 0x55f23dba7764 in run_builtin git.c:484
#19 0x55f23dba7764 in handle_builtin git.c:741
#20 0x55f23dbab61e in run_argv git.c:805
#21 0x55f23dbab61e in cmd_main git.c:1000
#22 0x55f23dba4781 in main common-main.c:64
#23 0x7fa0f063fc89 in __libc_start_call_main ../sysdeps/nptl/libc_start_call_main.h:58
#24 0x7fa0f063fd44 in __libc_start_main_impl ../csu/libc-start.c:360
#25 0x55f23dba6ad0 in _start (git+0xadfad0) (BuildId: 803b2b7f59beb03d7849fb8294a8e2145dd4aa27)
While it is somewhat awkward that the maintenance processes survive
tests in the first place, it is totally expected that reftables should
work alright with concurrent writers. Seemingly they don't.
The only underlying resource that we need to care about in this context
is the reftable reader, which is responsible for reading a single table
from disk. These readers get discarded immediately (unless reused) when
calling `reftable_stack_reload()`, which is wrong. We can only close
them once we know that there are no iterators using them anymore.
Prepare for a fix by converting the reftable readers to be refcounted.
Reported-by: Jeff King <peff@peff.net>
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Same as with the preceding commit, we also provide a `reader_close()`
function that allows the caller to close a reader without freeing it.
This is unnecessary now that all users will have an allocated version of
the reader.
Inline it into `reftable_reader_free()`.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Most users use an allocated version of the `reftable_reader`, except for
some tests. We are about to convert the reader to become refcounted
though, and providing the ability to keep a reader on the stack makes
this conversion harder than necessary.
Update the tests to use `reftable_reader_new()` instead to prepare for
this change.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The merged table provides access to a reftable stack by merging the
contents of those tables into a virtual table. These subtables are being
tracked via `struct reftable_table`, which is a generic interface for
accessing either a single reftable or a merged reftable. So in theory,
it would be possible for the merged table to merge together other merged
tables.
This is somewhat nonsensical though: we only ever set up a merged table
over normal reftables, and there is no reason to do otherwise. This
generic interface thus makes the code way harder to follow and reason
about than really necessary. The abstraction layer may also have an
impact on performance, even though the extra set of vtable function
calls probably doesn't really matter.
Refactor the merged tables to use a `struct reftable_reader` for each of
the subtables instead, which gives us direct access to the underlying
tables. Adjust names accordingly.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This supports reading a single reftable file.
The commit introduces an abstract iterator type, which captures the usecases
both of reading individual refs, and iterating over a segment of the ref
namespace.
Signed-off-by: Han-Wen Nienhuys <hanwen@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>