midx: sort and deduplicate objects from packfiles

Before writing a list of objects and their offsets to a multi-pack-index,
we need to collect the list of objects contained in the packfiles. There
may be multiple copies of some objects, so this list must be deduplicated.

It is possible to artificially get into a state where there are many
duplicate copies of objects. That can create high memory pressure if we
are to create a list of all objects before de-duplication. To reduce
this memory pressure without a significant performance drop,
automatically group objects by the first byte of their object id. Use
the IDX fanout tables to group the data, copy to a local array, then
sort.

Copy only the de-duplicated entries. Select the duplicate based on the
most-recent modified time of a packfile containing the object.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This commit is contained in:
Derrick Stolee
2018-07-12 15:39:29 -04:00
committed by Junio C Hamano
parent 3227565cfd
commit fe1ed56f5e
3 changed files with 147 additions and 0 deletions

View File

@ -69,6 +69,8 @@ extern int open_pack_index(struct packed_git *);
*/
extern void close_pack_index(struct packed_git *);
extern uint32_t get_pack_fanout(struct packed_git *p, uint32_t value);
extern unsigned char *use_pack(struct packed_git *, struct pack_window **, off_t, unsigned long *);
extern void close_pack_windows(struct packed_git *);
extern void close_pack(struct packed_git *);