Jeff King 0750bb5b51 cat-file: support "unordered" output for --batch-all-objects
If you're going to access the contents of every object in a
packfile, it's generally much more efficient to do so in
pack order, rather than in hash order. That increases the
locality of access within the packfile, which in turn is
friendlier to the delta base cache, since the packfile puts
related deltas next to each other. By contrast, hash order
is effectively random, since the sha1 has no discernible
relationship to the content.

This patch introduces an "--unordered" option to cat-file
which iterates over packs in pack-order under the hood. You
can see the results when dumping all of the file content:

  $ time ./git cat-file --batch-all-objects --buffer --batch | wc -c
  6883195596

  real	0m44.491s
  user	0m42.902s
  sys	0m5.230s

  $ time ./git cat-file --unordered \
                        --batch-all-objects --buffer --batch | wc -c
  6883195596

  real	0m6.075s
  user	0m4.774s
  sys	0m3.548s

Same output, different order, way faster. The same speed-up
applies even if you end up accessing the object content in a
different process, like:

  git cat-file --batch-all-objects --buffer --batch-check |
  grep blob |
  git cat-file --batch='%(objectname) %(rest)' |
  wc -c

Adding "--unordered" to the first command drops the runtime
in git.git from 24s to 3.5s.

  Side note: there are actually further speedups available
  for doing it all in-process now. Since we are outputting
  the object content during the actual pack iteration, we
  know where to find the object and could skip the extra
  lookup done by oid_object_info(). This patch stops short
  of that optimization since the underlying API isn't ready
  for us to make those sorts of direct requests.

So if --unordered is so much better, why not make it the
default? Two reasons:

  1. We've promised in the documentation that --batch-all-objects
     outputs in hash order. Since cat-file is plumbing,
     people may be relying on that default, and we can't
     change it.

  2. It's actually _slower_ for some cases. We have to
     compute the pack revindex to walk in pack order. And
     our de-duplication step uses an oidset, rather than a
     sort-and-dedup, which can end up being more expensive.
     If we're just accessing the type and size of each
     object, for example, like:

       git cat-file --batch-all-objects --buffer --batch-check

     my best-of-five warm cache timings go from 900ms to
     1100ms using --unordered. Though it's possible in a
     cold-cache or under memory pressure that we could do
     better, since we'd have better locality within the
     packfile.

And one final question: why is it "--unordered" and not
"--pack-order"? The answer is again two-fold:

  1. "pack order" isn't a well-defined thing across the
     whole set of objects. We're hitting loose objects, as
     well as objects in multiple packs, and the only
     ordering we're promising is _within_ a single pack. The
     rest is apparently random.

  2. The point here is optimization. So we don't want to
     promise any particular ordering, but only to say that
     we will choose an ordering which is likely to be
     efficient for accessing the object content. That leaves
     the door open for further changes in the future without
     having to add another compatibility option.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-08-13 13:48:31 -07:00
2018-05-30 14:04:08 +09:00
2018-07-24 14:50:50 -07:00
2018-06-19 02:19:42 +09:00
2018-06-25 13:22:39 -07:00
2018-06-25 13:22:38 -07:00
2018-05-29 12:42:30 +09:00
2018-07-18 12:20:28 -07:00
2017-12-27 11:16:25 -08:00
2018-06-25 13:22:37 -07:00
2018-07-18 12:20:28 -07:00
2018-03-30 12:49:57 -07:00
2018-03-30 12:49:57 -07:00
2018-05-30 14:04:07 +09:00
2018-08-02 15:30:42 -07:00
2018-05-08 15:59:17 +09:00
2018-06-01 15:06:37 +09:00
2018-07-24 14:50:47 -07:00
2018-05-08 15:59:22 +09:00
2017-12-08 09:16:27 -08:00
2017-12-08 09:16:27 -08:00
2018-08-02 15:30:40 -07:00
2018-08-02 15:30:40 -07:00
2018-07-18 12:20:34 -07:00
2018-05-30 14:04:10 +09:00
2018-05-08 15:59:34 +09:00
2018-05-08 15:59:17 +09:00
2018-08-02 15:30:43 -07:00
2018-08-02 15:30:42 -07:00
2018-05-21 23:55:12 -04:00
2018-05-08 15:59:34 +09:00
2018-06-19 09:34:32 -07:00
2018-06-25 13:22:27 -07:00
2018-08-02 15:30:44 -07:00
2018-08-02 15:30:44 -07:00
2018-06-01 15:06:37 +09:00
2018-07-16 14:27:39 -07:00
2018-04-11 13:09:55 +09:00
2018-05-30 21:51:28 +09:00
2018-08-02 15:30:45 -07:00
2018-07-18 12:20:28 -07:00
2018-08-02 15:30:42 -07:00
2017-12-27 12:28:06 -08:00
2018-06-21 12:22:48 -07:00
2018-06-21 12:22:48 -07:00
2018-05-30 14:04:07 +09:00
2018-02-02 11:28:41 -08:00
2018-06-01 15:06:37 +09:00
2018-07-18 12:20:28 -07:00
2018-07-24 14:50:47 -07:00
2018-05-30 14:04:07 +09:00
2018-08-02 15:30:42 -07:00
2017-12-12 10:41:15 -08:00
2017-12-19 11:33:55 -08:00
2018-01-16 12:16:54 -08:00
2018-08-02 15:30:42 -07:00
2018-05-08 15:59:21 +09:00
2018-06-28 12:53:29 -07:00
2018-06-25 13:22:27 -07:00
2018-08-02 15:30:42 -07:00
2018-06-28 09:33:30 -07:00
2018-07-18 12:20:28 -07:00
2018-03-15 12:01:08 -07:00
2018-06-01 15:06:37 +09:00
2018-04-24 11:12:32 +09:00
2018-03-30 12:49:57 -07:00
2018-03-30 12:49:57 -07:00
2018-08-02 15:30:43 -07:00
2018-08-02 15:30:43 -07:00
2018-08-02 15:30:39 -07:00
2018-05-30 21:51:28 +09:00
2018-07-09 14:38:12 -07:00
2018-05-29 17:10:05 +09:00
2018-07-18 12:20:28 -07:00

Git - fast, scalable, distributed revision control system

Git is a fast, scalable, distributed revision control system with an unusually rich command set that provides both high-level operations and full access to internals.

Git is an Open Source project covered by the GNU General Public License version 2 (some parts of it are under different licenses, compatible with the GPLv2). It was originally written by Linus Torvalds with help of a group of hackers around the net.

Please read the file INSTALL for installation instructions.

Many Git online resources are accessible from https://git-scm.com/ including full documentation and Git related tools.

See Documentation/gittutorial.txt to get started, then see Documentation/giteveryday.txt for a useful minimum set of commands, and Documentation/git-.txt for documentation of each command. If git has been correctly installed, then the tutorial can also be read with man gittutorial or git help tutorial, and the documentation of each command with man git-<commandname> or git help <commandname>.

CVS users may also want to read Documentation/gitcvs-migration.txt (man gitcvs-migration or git help cvs-migration if git is installed).

The user discussion and development of Git take place on the Git mailing list -- everyone is welcome to post bug reports, feature requests, comments and patches to git@vger.kernel.org (read Documentation/SubmittingPatches for instructions on patch submission). To subscribe to the list, send an email with just "subscribe git" in the body to majordomo@vger.kernel.org. The mailing list archives are available at https://public-inbox.org/git/, http://marc.info/?l=git and other archival sites.

Issues which are security relevant should be disclosed privately to the Git Security mailing list git-security@googlegroups.com.

The maintainer frequently sends the "What's cooking" reports that list the current status of various development topics to the mailing list. The discussion following them give a good reference for project status, development direction and remaining tasks.

The name "git" was given by Linus Torvalds when he wrote the very first version. He described the tool as "the stupid content tracker" and the name as (depending on your mood):

  • random three-letter combination that is pronounceable, and not actually used by any common UNIX command. The fact that it is a mispronunciation of "get" may or may not be relevant.
  • stupid. contemptible and despicable. simple. Take your pick from the dictionary of slang.
  • "global information tracker": you're in a good mood, and it actually works for you. Angels sing, and a light suddenly fills the room.
  • "goddamn idiotic truckload of sh*t": when it breaks
Description
No description provided
Readme 235 MiB
Languages
C 50.1%
Shell 38.4%
Perl 5.1%
Tcl 3.3%
Python 0.8%
Other 2%