bloom.c: core Bloom filter implementation for changed paths.

Add the core implementation for computing Bloom filters for
the paths changed between a commit and it's first parent.

We fill the Bloom filters as (const char *data, int len) pairs
as `struct bloom_filters" within a commit slab.

Filters for commits with no changes and more than 512 changes,
is represented with a filter of length zero. There is no gain
in distinguishing between a computed filter of length zero for
a commit with no changes, and an uncomputed filter for new commits
or for commits with more than 512 changes. The effect on
`git log -- path` is the same in both cases. We will fall back to
the normal diffing algorithm when we can't benefit from the
existence of Bloom filters.

Helped-by: Jeff King <peff@peff.net>
Helped-by: Derrick Stolee <dstolee@microsoft.com>
Reviewed-by: Jakub Narębski <jnareb@gmail.com>
Signed-off-by: Garima Singh <garima.singh@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This commit is contained in:
Garima Singh
2020-03-30 00:31:26 +00:00
committed by Junio C Hamano
parent f1294eaf7f
commit ed591febb4
4 changed files with 172 additions and 0 deletions

View File

@ -1,6 +1,9 @@
#ifndef BLOOM_H
#define BLOOM_H
struct commit;
struct repository;
struct bloom_filter_settings {
/*
* The version of the hashing technique being used.
@ -73,4 +76,9 @@ void add_key_to_filter(const struct bloom_key *key,
struct bloom_filter *filter,
const struct bloom_filter_settings *settings);
void init_bloom_filters(void);
struct bloom_filter *get_bloom_filter(struct repository *r,
struct commit *c);
#endif