git/perf at 8320b1dbe7d160ea08dec931cf17dc39682bfb91 - git

Files

Jeff King 8320b1dbe7 revision: use a prio_queue to hold rewritten parents

This patch fixes a quadratic list insertion in rewrite_one() when
pathspec limiting is combined with --parents. What happens is something
like this:

  1. We see that some commit X touches the path, so we try to rewrite
     its parents.

  2. rewrite_one() loops forever, rewriting parents, until it finds a
     relevant parent (or hits the root and decides there are none). The
     heavy lifting is done by process_parent(), which uses
     try_to_simplify_commit() to drop parents.

  3. process_parent() puts any intermediate parents into the
     &revs->commits list, inserting by commit date as usual.

So if commit X is recent, and then there's a large chunk of history that
doesn't touch the path, we may add a lot of commits to &revs->commits.
And insertion by commit date is O(n) in the worst case, making the whole
thing quadratic.

We tried to deal with this long ago in fce87ae538 (Fix quadratic
performance in rewrite_one., 2008-07-12). In that scheme, we cache the
oldest commit in the list; if the new commit to be added is older, we
can start our linear traversal there. This often works well in practice
because parents are older than their descendants, and thus we tend to
add older and older commits as we traverse.

But this isn't guaranteed, and in fact there's a simple case where it is
not: merges. Imagine we look at the first parent of a merge and see a
very old commit (let's say 3 years old). And on the second parent, as we
go back 3 years in history, we might have many commits. That one
first-parent commit has polluted our oldest-commit cache; it will remain
the oldest while we traverse a huge chunk of history, during which we
have to fall back to the slow, linear method of adding to the list.

Naively, one might imagine that instead of caching the oldest commit,
we'd start at the last-added one. But that just makes some cases faster
while making others slower (and indeed, while it made a real-world test
case much faster, it does quite poorly in the perf test include here).
Fundamentally, these are just heuristics; our worst case is still
quadratic, and some cases will approach that.

Instead, let's use a data structure with better worst-case performance.
Swapping out revs->commits for something else would have repercussions
all over the code base, but we can take advantage of one fact: for the
rewrite_one() case, nobody actually needs to see those commits in
revs->commits until we've finished generating the whole list.

That leaves us with two obvious options:

  1. We can generate the list _unordered_, which should be O(n), and
     then sort it afterwards, which would be O(n log n) total. This is
     "sort-after" below.

  2. We can insert the commits into a separate data structure, like a
     priority queue. This is "prio-queue" below.

I expected that sort-after would be the fastest (since it saves us the
extra step of copying the items into the linked list), but surprisingly
the prio-queue seems to be a bit faster.

Here are timings for the new p0001.6 for all three techniques across a
few repositories, as compared to master:

master              cache-last                sort-after              prio-queue
--------------------------------------------------------------------------------------------
GIT_PERF_REPO=git.git
0.52(0.50+0.02)      0.53(0.51+0.02)  +1.9%   0.37(0.33+0.03) -28.8%  0.37(0.32+0.04) -28.8%

GIT_PERF_REPO=linux.git
20.81(20.74+0.07)   20.31(20.24+0.07) -2.4%   0.94(0.86+0.07) -95.5%  0.91(0.82+0.09) -95.6%

GIT_PERF_REPO=llvm-project.git
83.67(83.57+0.09)    4.23(4.15+0.08) -94.9%   3.21(3.15+0.06) -96.2%  2.98(2.91+0.07) -96.4%

A few items to note:

  - the cache-list tweak does improve the bad case for llvm-project.git
    that started my digging into this problem. But it performs terribly
    on linux.git, barely helping at all.

  - the sort-after and prio-queue techniques work well. They approach
    the timing for running without --parents at all, which is what you'd
    expect (see below for more data).

  - prio-queue just barely outperforms sort-after. As I said, I'm not
    really sure why this is the case, but it is. You can see it even
    more prominently in this real-world case on llvm-project.git:

      git rev-list --parents 07ef786652e7 -- llvm/test/CodeGen/Generic/bswap.ll

    where prio-queue routinely outperforms sort-after by about 7%. One
    guess is that the prio-queue may just be more efficient because it
    uses a compact array.

There are three new perf tests:

  - "rev-list --parents" gives us a baseline for running with --parents.
    This isn't sped up meaningfully here, because the bad case is
    triggered only with simplification. But it's good to make sure we
    don't screw it up (now, or in the future).

  - "rev-list -- dummy" gives us a baseline for just traversing with
    pathspec limiting. This gives a lower bound for the next test (and
    it's also a good thing for us to be checking in general for
    regressions, since we don't seem to have any existing tests).

  - "rev-list --parents -- dummy" shows off the problem (and our fix)

Here are the timings for those three on llvm-project.git, before and
after the fix:

Test                                 master              prio-queue
------------------------------------------------------------------------------
0001.3: rev-list --parents           2.24(2.12+0.12)     2.22(2.11+0.11) -0.9%
0001.5: rev-list -- dummy            2.89(2.82+0.07)     2.92(2.89+0.03) +1.0%
0001.6: rev-list --parents -- dummy  83.67(83.57+0.09)   2.98(2.91+0.07) -96.4%

Changes in the first two are basically noise, and you can see we
approach our lower bound in the final one.

Note that we can't fully get rid of the list argument from
process_parents(). Other callers do have lists, and it would be hard to
convert them. They also don't seem to have this problem (probably
because they actually remove items from the list as they loop, meaning
it doesn't grow so large in the first place). So this basically just
drops the "cache_ptr" parameter (which was used only by the one caller
we're fixing here) and replaces it with a prio_queue. Callers are free
to use either data structure, depending on what they're prepared to
handle.

Reported-by: Björn Pettersson A <bjorn.a.pettersson@ericsson.com>
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

2019-04-04 18:21:54 +09:00

repos

p0006-read-tree-checkout: perf test to time read-tree

2017-04-19 20:33:01 -07:00

.gitignore

…

aggregate.perl

t/perf: add infrastructure for measuring sizes

2018-08-20 14:04:47 -07:00

bisect_regression

t/perf: add scripts to bisect performance regressions

2018-04-11 15:14:02 +09:00

bisect_run_script

perf/bisect_run_script: disable codespeed

2018-05-06 13:04:54 +09:00

lib-pack.sh

t/perf/lib-pack: use fast-import checkpoint to create packs

2017-11-21 11:07:28 +09:00

Makefile

…

min_time.perl

…

p0000-perf-lib-sanity.sh

t/perf: correctly align non-ASCII descriptions in output

2017-04-23 21:33:15 -07:00

p0001-rev-list.sh

revision: use a prio_queue to hold rewritten parents

2019-04-04 18:21:54 +09:00

p0002-read-cache.sh

t/helper: merge test-read-cache into test-tool

2018-03-27 08:45:47 -07:00

p0003-delta-base-cache.sh

t/perf: add basic perf tests for delta base cache

2016-08-23 15:26:16 -07:00

p0004-lazy-init-name-hash.sh

t/helper: merge test-lazy-init-name-hash into test-tool

2018-03-27 08:45:47 -07:00

p0005-status.sh

string-list: use ALLOC_GROW macro when reallocing string_list

2017-04-15 02:04:41 -07:00

p0006-read-tree-checkout.sh

p0006-read-tree-checkout: perf test to time read-tree

2017-04-19 20:33:01 -07:00

p0007-write-cache.sh

t/helper: merge test-write-cache into test-tool

2018-03-27 08:45:47 -07:00

p0071-sort.sh

t/helper: merge test-string-list into test-tool

2018-03-27 08:45:47 -07:00

p0100-globbing.sh

perf: add test showing exponential growth in path globbing

2017-05-12 10:07:43 +09:00

p1450-fsck.sh

fsck: add a performance test

2018-09-12 15:17:46 -07:00

p1451-fsck-skip-list.sh

fsck: add a performance test for skipList

2018-09-12 15:17:46 -07:00

p3400-rebase.sh

p3400: replace calls to git checkout -b' by git checkout -B'

2018-11-12 16:40:55 +09:00

p3404-rebase-interactive.sh

perf: run "rebase -i" under perf

2016-05-13 11:07:12 -07:00

p4000-diff-algorithms.sh

…

p4001-diff-no-index.sh

…

p4205-log-pretty-formats.sh

p4205: add perf test script for pretty log formats

2017-06-24 11:05:02 -07:00

p4211-line-log.sh

sha1_file: use strbuf_add() instead of strbuf_addf()

2017-12-04 10:38:55 -08:00

p4220-log-grep-engines.sh

perf: add a comparison test of log --grep regex engines

2017-05-26 12:52:37 +09:00

p4221-log-grep-engines-fixed.sh

perf: add a comparison test of log --grep regex engines with -F

2017-05-26 12:52:37 +09:00

p5302-pack-index.sh

p5302: create repositories for index-pack results explicitly

2017-02-06 10:55:25 -08:00

p5303-many-packs.sh

t/perf: add tests for many-pack scenarios

2016-07-29 11:05:06 -07:00

p5310-pack-bitmaps.sh

pack-objects: use reachability bitmap index when generating non-stdout pack

2016-09-12 13:47:41 -07:00

p5311-pack-bitmaps-fetch.sh

t/perf: add perf tests for fetches from a bitmapped server

2018-08-20 14:04:47 -07:00

p5550-fetch-tags.sh

p5550: factor out nonsense-pack creation

2017-11-21 11:07:12 +09:00

p5551-fetch-rescan.sh

p5551: add a script to test fetch pack-dir rescans

2017-11-21 11:08:20 +09:00

p7000-filter-branch.sh

p7000: add test for filter-branch with --prune-empty

2017-03-03 12:43:37 -08:00

p7300-clean.sh

resolve_gitlink_ref: ignore non-repository paths

2016-01-25 11:42:13 -08:00

p7519-fsmonitor.sh

t/helper: merge test-drop-caches into test-tool

2018-03-27 08:45:47 -07:00

p7810-grep.sh

…

p7820-grep-engines.sh

perf: amend the grep tests to test grep.threads

2018-01-04 10:24:48 -08:00

p7821-grep-engines-fixed.sh

perf: amend the grep tests to test grep.threads

2018-01-04 10:24:48 -08:00

perf-lib.sh

tests: send "bug in the test script" errors to the script's stderr

2018-11-20 12:16:35 +09:00

README

t/perf: add infrastructure for measuring sizes

2018-08-20 14:04:47 -07:00

run

perf/run: add --subsection option

2018-04-11 15:14:00 +09:00

README

Git performance tests
=====================

This directory holds performance testing scripts for git tools.  The
first part of this document describes the various ways in which you
can run them.

When fixing the tools or adding enhancements, you are strongly
encouraged to add tests in this directory to cover what you are
trying to fix or enhance.  The later part of this short document
describes how your test scripts should be organized.


Running Tests
-------------

The easiest way to run tests is to say "make".  This runs all
the tests on the current git repository.

    === Running 2 tests in this tree ===
    [...]
    Test                                     this tree
    ---------------------------------------------------------
    0001.1: rev-list --all                   0.54(0.51+0.02)
    0001.2: rev-list --all --objects         6.14(5.99+0.11)
    7810.1: grep worktree, cheap regex       0.16(0.16+0.35)
    7810.2: grep worktree, expensive regex   7.90(29.75+0.37)
    7810.3: grep --cached, cheap regex       3.07(3.02+0.25)
    7810.4: grep --cached, expensive regex   9.39(30.57+0.24)

You can compare multiple repositories and even git revisions with the
'run' script:

    $ ./run . origin/next /path/to/git-tree p0001-rev-list.sh

where . stands for the current git tree.  The full invocation is

    ./run [<revision|directory>...] [--] [<test-script>...]

A '.' argument is implied if you do not pass any other
revisions/directories.

You can also manually test this or another git build tree, and then
call the aggregation script to summarize the results:

    $ ./p0001-rev-list.sh
    [...]
    $ GIT_BUILD_DIR=/path/to/other/git ./p0001-rev-list.sh
    [...]
    $ ./aggregate.perl . /path/to/other/git ./p0001-rev-list.sh

aggregate.perl has the same invocation as 'run', it just does not run
anything beforehand.

You can set the following variables (also in your config.mak):

    GIT_PERF_REPEAT_COUNT
	Number of times a test should be repeated for best-of-N
	measurements.  Defaults to 3.

    GIT_PERF_MAKE_OPTS
	Options to use when automatically building a git tree for
	performance testing. E.g., -j6 would be useful. Passed
	directly to make as "make $GIT_PERF_MAKE_OPTS".

    GIT_PERF_MAKE_COMMAND
	An arbitrary command that'll be run in place of the make
	command, if set the GIT_PERF_MAKE_OPTS variable is
	ignored. Useful in cases where source tree changes might
	require issuing a different make command to different
	revisions.

	This can be (ab)used to monkeypatch or otherwise change the
	tree about to be built. Note that the build directory can be
	re-used for subsequent runs so the make command might get
	executed multiple times on the same tree, but don't count on
	any of that, that's an implementation detail that might change
	in the future.

    GIT_PERF_REPO
    GIT_PERF_LARGE_REPO
	Repositories to copy for the performance tests.  The normal
	repo should be at least git.git size.  The large repo should
	probably be about linux.git size for optimal results.
	Both default to the git.git you are running from.

You can also pass the options taken by ordinary git tests; the most
useful one is:

--root=<directory>::
	Create "trash" directories used to store all temporary data during
	testing under <directory>, instead of the t/ directory.
	Using this option with a RAM-based filesystem (such as tmpfs)
	can massively speed up the test suite.


Naming Tests
------------

The performance test files are named as:

	pNNNN-commandname-details.sh

where N is a decimal digit.  The same conventions for choosing NNNN as
for normal tests apply.


Writing Tests
-------------

The perf script starts much like a normal test script, except it
sources perf-lib.sh:

	#!/bin/sh
	#
	# Copyright (c) 2005 Junio C Hamano
	#

	test_description='xxx performance test'
	. ./perf-lib.sh

After that you will want to use some of the following:

	test_perf_fresh_repo    # sets up an empty repository
	test_perf_default_repo  # sets up a "normal" repository
	test_perf_large_repo    # sets up a "large" repository

	test_perf_default_repo sub  # ditto, in a subdir "sub"

        test_checkout_worktree  # if you need the worktree too

At least one of the first two is required!

You can use test_expect_success as usual. In both test_expect_success
and in test_perf, running "git" points to the version that is being
perf-tested. The $MODERN_GIT variable points to the git wrapper for the
currently checked-out version (i.e., the one that matches the t/perf
scripts you are running).  This is useful if your setup uses commands
that only work with newer versions of git than what you might want to
test (but obviously your new commands must still create a state that can
be used by the older version of git you are testing).

For actual performance tests, use

	test_perf 'descriptive string' '
		command1 &&
		command2
	'

test_perf spawns a subshell, for lack of better options.  This means
that

* you _must_ export all variables that you need in the subshell

* you _must_ flag all variables that you want to persist from the
  subshell with 'test_export':

	test_perf 'descriptive string' '
		foo=$(git rev-parse HEAD) &&
		test_export foo
	'

  The so-exported variables are automatically marked for export in the
  shell executing the perf test.  For your convenience, test_export is
  the same as export in the main shell.

  This feature relies on a bit of magic using 'set' and 'source'.
  While we have tried to make sure that it can cope with embedded
  whitespace and other special characters, it will not work with
  multi-line data.

Rather than tracking the performance by run-time as `test_perf` does, you
may also track output size by using `test_size`. The stdout of the
function should be a single numeric value, which will be captured and
shown in the aggregated output. For example:

	test_perf 'time foo' '
		./foo >foo.out
	'

	test_size 'output size'
		wc -c <foo.out
	'

might produce output like:

	Test                origin           HEAD
	-------------------------------------------------------------
	1234.1 time foo     0.37(0.79+0.02)  0.26(0.51+0.02) -29.7%
	1234.2 output size             4.3M             3.6M -14.7%

The item being measured (and its units) is up to the test; the context
and the test title should make it clear to the user whether bigger or
smaller numbers are better. Unlike test_perf, the test code will only be
run once, since output sizes tend to be more deterministic than timings.