5c49c11686df9d1c27a194349d0b2092e6446f42
With large amount of objects, check_object() is really trashing the pack
sliding map and the filesystem cache. It has a completely random access
pattern especially with old objects where delta replay jumps back and
forth all over the pack.
This patch improves things by:
1) sorting objects by their offset in pack before calling check_object()
so the pack access pattern is linear;
2) recording the object type at add_object_entry() time since it is
already known in most cases;
3) recording the pack offset even for preferred_base objects;
4) avoid calling sha1_object_info() if all possible.
This limits pack accesses to the bare minimum and makes them perfectly
linear.
In the process check_object() was made more clear (to me at least).
Note: I thought about walking the sorted_by_offset list backward in
get_object_details() so if a pack happens to be larger than the available
file cache, then the cache would have been populated with useful data from
the beginning of the pack already when find_deltas() is called. Strangely,
testing (on Linux) showed absolutely no performance difference.
Signed-off-by: Nicolas Pitre <nico@cam.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
…
…
…
…
…
…
…
…
…
…
…
…
…
…
//////////////////////////////////////////////////////////////// GIT - the stupid content tracker //////////////////////////////////////////////////////////////// "git" can mean anything, depending on your mood. - random three-letter combination that is pronounceable, and not actually used by any common UNIX command. The fact that it is a mispronunciation of "get" may or may not be relevant. - stupid. contemptible and despicable. simple. Take your pick from the dictionary of slang. - "global information tracker": you're in a good mood, and it actually works for you. Angels sing, and a light suddenly fills the room. - "goddamn idiotic truckload of sh*t": when it breaks Git is a fast, scalable, distributed revision control system with an unusually rich command set that provides both high-level operations and full access to internals. Git is an Open Source project covered by the GNU General Public License. It was originally written by Linus Torvalds with help of a group of hackers around the net. It is currently maintained by Junio C Hamano. Please read the file INSTALL for installation instructions. See Documentation/tutorial.txt to get started, then see Documentation/everyday.txt for a useful minimum set of commands, and "man git-commandname" for documentation of each command. CVS users may also want to read Documentation/cvs-migration.txt. Many Git online resources are accessible from http://git.or.cz/ including full documentation and Git related tools. The user discussion and development of Git take place on the Git mailing list -- everyone is welcome to post bug reports, feature requests, comments and patches to git@vger.kernel.org. To subscribe to the list, send an email with just "subscribe git" in the body to majordomo@vger.kernel.org. The mailing list archives are available at http://marc.theaimsgroup.com/?l=git and other archival sites. The messages titled "A note from the maintainer", "What's in git.git (stable)" and "What's cooking in git.git (topics)" and the discussion following them on the mailing list give a good reference for project status, development direction and remaining tasks.
Description
Languages
C
50.1%
Shell
38.4%
Perl
5.1%
Tcl
3.3%
Python
0.8%
Other
2%