diff histogram: intern strings

Histogram is the only diff algorithm not to call
xdl_classify_record(). xdl_classify_record() ensures that the hash
values of two strings that are not equal differ which means that it is
not necessary to use xdl_recmatch() when comparing lines, all that is
necessary is to compare the hash values. This gives a 7% reduction in
the runtime of "git log --patch" when using the histogram diff
algorithm.

Test                                  HEAD^             HEAD
-----------------------------------------------------------------------------
4000.1: log -3000 (baseline)          0.18(0.14+0.04)   0.19(0.17+0.02) +5.6%
4000.2: log --raw -3000 (tree-only)   0.99(0.77+0.21)   0.98(0.78+0.20) -1.0%
4000.3: log -p -3000 (Myers)          4.84(4.31+0.51)   4.81(4.15+0.64) -0.6%
4000.4: log -p -3000 --histogram      6.34(5.86+0.46)   5.87(5.19+0.66) -7.4%
4000.5: log -p -3000 --patience       5.39(4.60+0.76)   5.35(4.60+0.73) -0.7%

Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This commit is contained in:
Phillip Wood
2021-11-17 11:20:23 +00:00
committed by Junio C Hamano
parent cd3e606211
commit 663c5ad035
2 changed files with 10 additions and 19 deletions

View File

@ -91,9 +91,8 @@ struct region {
static int cmp_recs(xpparam_t const *xpp,
xrecord_t *r1, xrecord_t *r2)
{
return r1->ha == r2->ha &&
xdl_recmatch(r1->ptr, r1->size, r2->ptr, r2->size,
xpp->flags);
return r1->ha == r2->ha;
}
#define CMP_ENV(xpp, env, s1, l1, s2, l2) \