pack-revindex: read multi-pack reverse indexes

Implement reading for multi-pack reverse indexes, as described in the
previous patch.

Note that these functions don't yet have any callers, and won't until
multi-pack reachability bitmaps are introduced in a later patch series.
In the meantime, this patch implements some of the infrastructure
necessary to support multi-pack bitmaps.

There are three new functions exposed by the revindex API:

  - load_midx_revindex(): loads the reverse index corresponding to the
    given multi-pack index.

  - midx_to_pack_pos() and pack_pos_to_midx(): these convert between the
    multi-pack index and pseudo-pack order.

load_midx_revindex() and pack_pos_to_midx() are both relatively
straightforward.

load_midx_revindex() needs a few functions to be exposed from the midx
API. One to get the checksum of a midx, and another to get the .rev's
filename. Similar to recent changes in the packed_git struct, three new
fields are added to the multi_pack_index struct: one to keep track of
the size, one to keep track of the mmap'd pointer, and another to point
past the header and at the reverse index's data.

pack_pos_to_midx() simply reads the corresponding entry out of the
table.

midx_to_pack_pos() is the trickiest, since it needs to find an object's
position in the psuedo-pack order, but that order can only be recovered
in the .rev file itself. This mapping can be implemented with a binary
search, but note that the thing we're binary searching over isn't an
array of values, but rather a permuted order of those values.

So, when comparing two items, it's helpful to keep in mind the
difference. Instead of a traditional binary search, where you are
comparing two things directly, here we're comparing a (pack, offset)
tuple with an index into the multi-pack index. That index describes
another (pack, offset) tuple, and it is _those_ two tuples that are
compared.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This commit is contained in:
Taylor Blau
2021-03-30 11:04:26 -04:00
committed by Junio C Hamano
parent b25fd24c00
commit f894081dea
5 changed files with 199 additions and 0 deletions

View File

@ -14,6 +14,20 @@
*
* - offset: the byte offset within the .pack file at which the object contents
* can be found
*
* The revindex can also be used with a multi-pack index (MIDX). In this
* setting:
*
* - index position refers to an object's numeric position within the MIDX
*
* - pack position refers to an object's position within a non-existent pack
* described by the MIDX. The pack structure is described in
* Documentation/technical/pack-format.txt.
*
* It is effectively a concatanation of all packs in the MIDX (ordered by
* their numeric ID within the MIDX) in their original order within each
* pack), removing duplicates, and placing the preferred pack (if any)
* first.
*/
@ -24,6 +38,7 @@
#define GIT_TEST_REV_INDEX_DIE_IN_MEMORY "GIT_TEST_REV_INDEX_DIE_IN_MEMORY"
struct packed_git;
struct multi_pack_index;
/*
* load_pack_revindex populates the revindex's internal data-structures for the
@ -34,6 +49,22 @@ struct packed_git;
*/
int load_pack_revindex(struct packed_git *p);
/*
* load_midx_revindex loads the '.rev' file corresponding to the given
* multi-pack index by mmap-ing it and assigning pointers in the
* multi_pack_index to point at it.
*
* A negative number is returned on error.
*/
int load_midx_revindex(struct multi_pack_index *m);
/*
* Frees resources associated with a multi-pack reverse index.
*
* A negative number is returned on error.
*/
int close_midx_revindex(struct multi_pack_index *m);
/*
* offset_to_pack_pos converts an object offset to a pack position. This
* function returns zero on success, and a negative number otherwise. The
@ -71,4 +102,26 @@ uint32_t pack_pos_to_index(struct packed_git *p, uint32_t pos);
*/
off_t pack_pos_to_offset(struct packed_git *p, uint32_t pos);
/*
* pack_pos_to_midx converts the object at position "pos" within the MIDX
* pseudo-pack into a MIDX position.
*
* If the reverse index has not yet been loaded, or the position is out of
* bounds, this function aborts.
*
* This function runs in time O(log N) with the number of objects in the MIDX.
*/
uint32_t pack_pos_to_midx(struct multi_pack_index *m, uint32_t pos);
/*
* midx_to_pack_pos converts from the MIDX-relative position at "at" to the
* corresponding pack position.
*
* If the reverse index has not yet been loaded, or the position is out of
* bounds, this function aborts.
*
* This function runs in constant time.
*/
int midx_to_pack_pos(struct multi_pack_index *midx, uint32_t at, uint32_t *pos);
#endif