Commit Graph

4939 Commits

Author SHA1 Message Date
1cc2c772dd prune-packed: fix minor memory leak
We form all of our directories in a strbuf, but never release it.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-09-15 11:37:43 -07:00
e4a590efa2 builtin/log.c: mark strings for translation
Signed-off-by: Matthias Ruester <matthias.ruester@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-09-15 11:29:54 -07:00
30d1038d1b fsck: return non-zero status on missing ref tips
Fsck tries hard to detect missing objects, and will complain
(and exit non-zero) about any inter-object links that are
missing. However, it will not exit non-zero for any missing
ref tips, meaning that a severely broken repository may
still pass "git fsck && echo ok".

The problem is that we use for_each_ref to iterate over the
ref tips, which hides broken tips. It does at least print an
error from the refs.c code, but fsck does not ever see the
ref and cannot note the problem in its exit code. We can solve
this by using for_each_rawref and noting the error ourselves.

In addition to adding tests for this case, we add tests for
all types of missing-object links (all of which worked, but
which we were not testing).

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-09-12 10:45:49 -07:00
c1063be2a3 config: avoid a funny sentinel value "a^"
Introduce CONFIG_REGEX_NONE as a more explicit sentinel value to say
"we do not want to replace any existing entry" and use it in the
implementation of "git config --add".

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-09-11 16:33:54 -07:00
5ba9a93b39 hash-object: add --literally option
This allows "hash-object --stdin" to just hash any garbage into a
"loose object" that may not pass the standard object parsing check
or fsck, so that different kind of corrupt objects we may encounter
in the field can be imitated in our test suite.  That would in turn
allow us to test features that catch these corrupt objects.

Note that "cat-file" may need to learn "--literally" option to allow
us peek into a truly broken object.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-09-11 14:23:51 -07:00
17b787f603 hash-object: pass 'write_object' as a flag
Instead of forcing callers of lower level functions write
(write_object ? HASH_WRITE_OBJECT : 0), prepare the flag to be
passed down in the callchain from the command line parser.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-09-11 12:48:01 -07:00
b64a984606 hash-object: reduce file-scope statics
Most of the knobs that affect helper functions called from
cmd_hash_object() were passed to them as parameters already, and the
only effect of having them as file-scope statics was to make the
reader wonder if the parameters are hiding the file-scope global
values by accident.  Adjust their initialisation and make them
function-local variables.

The only exception was no_filters hash_stdin_paths() peeked from the
file-scope global, which was converted to a parameter to the helper
function.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-09-11 12:23:42 -07:00
08ad26a63d Merge branch 'nd/fetch-pass-quiet-to-gc-child-process'
Progress output from "git gc --auto" was visible in "git fetch -q".

* nd/fetch-pass-quiet-to-gc-child-process:
  fetch: silence git-gc if --quiet is given
  fetch: convert argv_gc_auto to struct argv_array
2014-09-11 10:33:32 -07:00
3fd13cbcd5 Merge branch 'dt/cache-tree-repair'
Add a few more places in "commit" and "checkout" that make sure
that the cache-tree is fully populated in the index.

* dt/cache-tree-repair:
  cache-tree: do not try to use an invalidated subtree info to build a tree
  cache-tree: Write updated cache-tree after commit
  cache-tree: subdirectory tests
  test-dump-cache-tree: invalid trees are not errors
  cache-tree: create/update cache-tree on checkout
2014-09-11 10:33:32 -07:00
01d678a226 Merge branch 'rs/ref-transaction-1'
The second batch of the transactional ref update series.

* rs/ref-transaction-1: (22 commits)
  update-ref --stdin: pass transaction around explicitly
  update-ref --stdin: narrow scope of err strbuf
  refs.c: make delete_ref use a transaction
  refs.c: make prune_ref use a transaction to delete the ref
  refs.c: remove lock_ref_sha1
  refs.c: remove the update_ref_write function
  refs.c: remove the update_ref_lock function
  refs.c: make lock_ref_sha1 static
  walker.c: use ref transaction for ref updates
  fast-import.c: use a ref transaction when dumping tags
  receive-pack.c: use a reference transaction for updating the refs
  refs.c: change update_ref to use a transaction
  branch.c: use ref transaction for all ref updates
  fast-import.c: change update_branch to use ref transactions
  sequencer.c: use ref transactions for all ref updates
  commit.c: use ref transactions for updates
  replace.c: use the ref transaction functions for updates
  tag.c: use ref transactions when doing updates
  refs.c: add transaction.status and track OPEN/CLOSED
  refs.c: make ref_transaction_begin take an err argument
  ...
2014-09-11 10:33:31 -07:00
5e1dc48858 Merge branch 'nd/mv-code-cleaning'
Code clean-up.

* nd/mv-code-cleaning:
  mv: no SP between function name and the first opening parenthese
  mv: combine two if(s)
  mv: unindent one level for directory move code
  mv: move index search code out
  mv: remove an "if" that's always true
  mv: split submodule move preparation code out
  mv: flatten error handling code block
  mv: mark strings for translations
2014-09-11 10:33:30 -07:00
825fd93767 Merge branch 'rs/child-process-init'
Code clean-up.

* rs/child-process-init:
  run-command: inline prepare_run_command_v_opt()
  run-command: call run_command_v_opt_cd_env() instead of duplicating it
  run-command: introduce child_process_init()
  run-command: introduce CHILD_PROCESS_INIT
2014-09-11 10:33:27 -07:00
554913daf4 Merge branch 'ta/config-set-2'
Update git_config() users with callback functions for a very narrow
scope with calls to config-set API that lets us query a single
variable.

* ta/config-set-2:
  builtin/apply.c: replace `git_config()` with `git_config_get_string_const()`
  merge-recursive.c: replace `git_config()` with `git_config_get_int()`
  ll-merge.c: refactor `read_merge_config()` to use `git_config_string()`
  fast-import.c: replace `git_config()` with `git_config_get_*()` family
  branch.c: replace `git_config()` with `git_config_get_string()
  alias.c: replace `git_config()` with `git_config_get_string()`
  imap-send.c: replace `git_config()` with `git_config_get_*()` family
  pager.c: replace `git_config()` with `git_config_get_value()`
  builtin/gc.c: replace `git_config()` with `git_config_get_*()` family
  rerere.c: replace `git_config()` with `git_config_get_*()` family
  fetchpack.c: replace `git_config()` with `git_config_get_*()` family
  archive.c: replace `git_config()` with `git_config_get_bool()` family
  read-cache.c: replace `git_config()` with `git_config_get_*()` family
  http-backend.c: replace `git_config()` with `git_config_get_bool()` family
  daemon.c: replace `git_config()` with `git_config_get_bool()` family
2014-09-11 10:33:26 -07:00
90a398bbd7 fsck_object(): allow passing object data separately from the object itself
When fsck'ing an incoming pack, we need to fsck objects that cannot be
read via read_sha1_file() because they are not local yet (and might even
be rejected if transfer.fsckobjects is set to 'true').

For commits, there is a hack in place: we basically cache commit
objects' buffers anyway, but the same is not true, say, for tag objects.

By refactoring fsck_object() to take the object buffer and size as
optional arguments -- optional, because we still fall back to the
previous method to look at the cached commit objects if the caller
passes NULL -- we prepare the machinery for the upcoming handling of tag
objects.

The assumption that such buffers are inherently NUL terminated is now
wrong, of course, hence we pass the size of the buffer so that we can
add a sanity check later, to prevent running past the end of the buffer.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-09-10 13:54:21 -07:00
2e770fe47e fsck: exit with non-zero status upon error from fsck_obj()
Upon finding a corrupt loose object, we forgot to note the error to
signal it with the exit status of the entire process.

[jc: adjusted t1450 and added another test]

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-09-10 09:40:53 -07:00
8015a60715 Merge branch 'rs/clean-menu-item-defn'
* rs/clean-menu-item-defn:
  clean: use f(void) instead of f() to declare a pointer to a function without arguments
2014-09-09 12:54:06 -07:00
08668f1802 Merge branch 'sb/mailsplit-dead-code-removal'
* sb/mailsplit-dead-code-removal:
  mailsplit.c: remove dead code
2014-09-09 12:54:04 -07:00
715b63ceb3 Merge branch 'sb/prepare-revision-walk-error-check'
* sb/prepare-revision-walk-error-check:
  prepare_revision_walk(): check for return value in all places
2014-09-09 12:54:03 -07:00
929df991c2 Merge branch 'sb/blame-msg-i18n'
* sb/blame-msg-i18n:
  builtin/blame.c: add translation to warning about failed revision walk
2014-09-09 12:54:03 -07:00
27fbcf8267 Merge branch 'sb/plug-leaks'
* sb/plug-leaks:
  clone.c: don't leak memory in cmd_clone
  remote.c: don't leak the base branch name in format_tracking_info
2014-09-09 12:54:02 -07:00
1bada2b0cc Merge branch 'mm/log-branch-desc-plug-leak'
* mm/log-branch-desc-plug-leak:
  builtin/log.c: fix minor memory leak
2014-09-09 12:53:59 -07:00
ead51a75d5 Merge branch 'jc/apply-ws-prefix'
Applying a patch not generated by Git in a subdirectory used to
check the whitespace breakage using the attributes for incorrect
paths. Also whitespace checks were performed even for paths
excluded via "git apply --exclude=<path>" mechanism.

* jc/apply-ws-prefix:
  apply: omit ws check for excluded paths
  apply: hoist use_patch() helper for path exclusion up
  apply: use the right attribute for paths in non-Git patches
2014-09-09 12:53:58 -07:00
4f1bbd23af mv: no SP between function name and the first opening parenthese
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-09-03 15:06:59 -07:00
dcadc8b806 mv: combine two if(s)
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-09-03 15:06:59 -07:00
b46b15dea0 mv: unindent one level for directory move code
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-09-03 15:06:52 -07:00
e2b6cfa02e mv: move index search code out
"Huh?" is removed from die() message.

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-09-03 14:59:43 -07:00
42de4b169c mv: remove an "if" that's always true
This is inside an "else" block of "if (last - first < 1)", so we know
that "last - first >= 1" when we come here. No need to check
"last - first > 0".

While at there, save "argc + last - first" to a variable to shorten
the statements a bit.

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-09-03 14:59:43 -07:00
3af05a6d0d mv: split submodule move preparation code out
"Huh?" is removed from die() message.

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-09-03 14:59:40 -07:00
693eb02a5e calloc() and xcalloc() takes nmemb and then size
There are a handful more instances of this in compat/regex/ but they
are borrowed code taht we do not want to touch with a change that
really affects correctness, which this change is not.

Signed-off-by: Arjun Sreedharan <arjun024@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-09-03 14:35:37 -07:00
88499b296b update-ref --stdin: pass transaction around explicitly
This makes it more obvious at a glance where the output of functions
parsing the --stdin stream goes.

No functional change intended.

Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Reviewed-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-09-03 10:04:19 -07:00
ab5ac95725 update-ref --stdin: narrow scope of err strbuf
Making the strbuf local in each function that needs to print errors
saves the reader from having to think about action at a distance,
such as

 * errors piling up and being concatenated with no newline between
   them
 * errors unhandled in one function, to be later handled in another
 * concurrency issues, if this code starts using threads some day

No functional change intended.

Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Reviewed-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-09-03 10:04:19 -07:00
6629ea2d4a receive-pack.c: use a reference transaction for updating the refs
Wrap all the ref updates inside a transaction.

In the new API there is no distinction between failure to lock and
failure to write a ref.  Both can be permanent (e.g., a ref
"refs/heads/topic" is blocking creation of the lock file
"refs/heads/topic/1.lock") or transient (e.g., file system full) and
there's no clear difference in how the client should respond, so
replace the two statuses "failed to lock" and "failed to write" with
a single status "failed to update ref".  In both cases a more
detailed message is sent by sideband to diagnose the problem.

Example, before:

 error: there are still refs under 'refs/heads/topic'
 remote: error: failed to lock refs/heads/topic
 To foo
  ! [remote rejected] HEAD -> topic (failed to lock)

After:

 error: there are still refs under 'refs/heads/topic'
 remote: error: Cannot lock the ref 'refs/heads/topic'.
 To foo
  ! [remote rejected] HEAD -> topic (failed to update ref)

Signed-off-by: Ronnie Sahlberg <sahlberg@google.com>
Reviewed-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-09-03 10:04:14 -07:00
c0fe1ed084 commit.c: use ref transactions for updates
Change commit.c to use ref transactions for all ref updates.
Make sure we pass a NULL pointer to ref_transaction_update if have_old
is false.

Signed-off-by: Ronnie Sahlberg <sahlberg@google.com>
Reviewed-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-09-03 10:04:11 -07:00
867c2fac0a replace.c: use the ref transaction functions for updates
Update replace.c to use ref transactions for updates.

Signed-off-by: Ronnie Sahlberg <sahlberg@google.com>
Reviewed-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-09-03 10:04:10 -07:00
e5074bfe8c tag.c: use ref transactions when doing updates
Change tag.c to use ref transactions for all ref updates.

Signed-off-by: Ronnie Sahlberg <sahlberg@google.com>
Reviewed-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-09-03 10:04:10 -07:00
93a644ea9d refs.c: make ref_transaction_begin take an err argument
Add an err argument to _begin so that on non-fatal failures in future ref
backends we can report a nice error back to the caller.
While _begin can currently never fail for other reasons than OOM, in which
case we die() anyway, we may add other types of backends in the future.
For example, a hypothetical MySQL backend could fail in _begin with
"Can not connect to MySQL server. No route to host".

Signed-off-by: Ronnie Sahlberg <sahlberg@google.com>
Reviewed-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-09-03 10:04:08 -07:00
8c8bdc0d35 refs.c: update ref_transaction_delete to check for error and return status
Change ref_transaction_delete() to do basic error checking and return
non-zero on error. Update all callers to check the return for
ref_transaction_delete(). There are currently no conditions in _delete that
will return error but there will be in the future. Add an err argument that
will be updated on failure.

Signed-off-by: Ronnie Sahlberg <sahlberg@google.com>
Reviewed-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-09-03 10:04:08 -07:00
b416af5bcd refs.c: change ref_transaction_create to do error checking and return status
Do basic error checking in ref_transaction_create() and make it return
non-zero on error. Update all callers to check the result of
ref_transaction_create(). There are currently no conditions in _create that
will return error but there will be in the future. Add an err argument that
will be updated on failure.

Signed-off-by: Ronnie Sahlberg <sahlberg@google.com>
Reviewed-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-09-03 10:04:07 -07:00
f655651e09 Merge branch 'rs/strbuf-getcwd'
Reduce the use of fixed sized buffer passed to getcwd() calls
by introducing xgetcwd() helper.

* rs/strbuf-getcwd:
  use strbuf_add_absolute_path() to add absolute paths
  abspath: convert absolute_path() to strbuf
  use xgetcwd() to set $GIT_DIR
  use xgetcwd() to get the current directory or die
  wrapper: add xgetcwd()
  abspath: convert real_path_internal() to strbuf
  abspath: use strbuf_getcwd() to remember original working directory
  setup: convert setup_git_directory_gently_1 et al. to strbuf
  unix-sockets: use strbuf_getcwd()
  strbuf: add strbuf_getcwd()
2014-09-02 13:28:44 -07:00
e8e4ce72cd Merge branch 'rs/init-no-duplicate-real-path'
* rs/init-no-duplicate-real-path:
  init: avoid superfluous real_path() calls
2014-09-02 13:24:05 -07:00
1d8a6f6929 Merge branch 'mm/config-edit-global'
Start "git config --edit --global" from a skeletal per-user
configuration file contents, instead of a total blank, when the
user does not already have any.  This immediately reduces the need
for a later "Have you forgotten setting core.user?" and we can add
more to the template as we gain more experience.

* mm/config-edit-global:
  commit: advertise config --global --edit on guessed identity
  home_config_paths(): let the caller ignore xdg path
  config --global --edit: create a template file if needed
2014-09-02 13:23:20 -07:00
ad5fe3771b grammofix in user-facing messages
Signed-off-by: Alex Henrie <alexhenrie24@gmail.com>
Acked-by: Johan Herland <johan@herland.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-09-02 12:00:30 -07:00
24d36f1472 stylefix: asterisks stick to the variable, not the type
Signed-off-by: David Aguilar <davvid@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-09-02 11:33:32 -07:00
57065289a9 merge-tree: remove unused df_conflict arguments
merge_trees_recursive() stores a pointer to its parameter df_conflict in
its struct traverse_info, but it is never actually used.  Stop doing
that, remove the parameter and inline the function into merge_trees(),
as the latter is now only passing on its parameters.

Remove the parameter df_conflict from unresolved_directory() as well,
now that there is no way to pass it to merge_trees_recursive() through
that function anymore.

Signed-off-by: Rene Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-09-02 11:02:58 -07:00
ab791dd138 index-pack: fix race condition with duplicate bases
When we are resolving deltas in an indexed pack, we do it by
first selecting a potential base (either one stored in full
in the pack, or one created by resolving another delta), and
then resolving any deltas that use that base.  When we
resolve a particular delta, we flip its "real_type" field
from OBJ_{REF,OFS}_DELTA to whatever the real type is.

We assume that traversing the objects this way will visit
each delta only once. This is correct for most packs; we
visit the delta only when we process its base, and each
object (and thus each base) appears only once. However, if a
base object appears multiple times in the pack, we will try
to resolve any deltas based on it once for each instance.

We can detect this case by noting that a delta we are about
to resolve has already had its real_type field flipped, and
we already do so with an assert().  However, if multiple
threads are in use, we may race with another thread on
comparing and flipping the field. We need to synchronize the
access.

The right mechanism for doing this is a compare-and-swap (we
atomically "claim" the delta for our own and find out
whether our claim was successful). We can implement this
in C by using a pthread mutex to protect the operation. This
is not the fastest way of doing a compare-and-swap; many
processors provide instructions for this, and gcc and other
compilers provide builtins to access them. However, some
experiments showed that lock contention does not cause a
significant slowdown here. Adding c-a-s support for many
compilers would increase the maintenance burden (and we
would still end up including the pthread version as a
fallback).

Note that we only need to touch the OBJ_REF_DELTA codepath
here. An OBJ_OFS_DELTA object points to its base using an
offset, and therefore has only one base, even if another
copy of that base object appears in the pack (we do still
touch it briefly because the setting of real_type is
factored out of resolve_data).

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-08-29 14:50:43 -07:00
466fb6742d pretty: provide a strict ISO 8601 date format
Git's "ISO" date format does not really conform to the ISO 8601
standard due to small differences, and it cannot be parsed by ISO
8601-only parsers, e.g. those of XML toolchains.

The output from "--date=iso" deviates from ISO 8601 in these ways:

  - a space instead of the `T` date/time delimiter
  - a space between time and time zone
  - no colon between hours and minutes of the time zone

Add a strict ISO 8601 date format for displaying committer and
author dates.  Use the '%aI' and '%cI' format specifiers and add
'--date=iso-strict' or '--date=iso8601-strict' date format names.

See http://thread.gmane.org/gmane.comp.version-control.git/255879 and
http://thread.gmane.org/gmane.comp.version-control.git/52414/focus=52585
for discussion.

Signed-off-by: Beat Bolli <bbolli@ewanet.ch>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-08-29 12:37:02 -07:00
f4ef517393 determine_author_info(): copy getenv output
When figuring out the author name for a commit, we may end
up either pointing to const storage from getenv("GIT_AUTHOR_*"),
or to newly allocated storage based on an existing commit or
the --author option.

Using const pointers to getenv's return has two problems:

  1. It is not guaranteed that the return value from getenv
     remains valid across multiple calls.

  2. We do not know whether to free the values at the end,
     so we just leak them.

We can solve both by duplicating the string returned by
getenv().

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-08-29 10:33:28 -07:00
f0f9662ae9 determine_author_info(): reuse parsing functions
Rather than parsing the header manually to find the "author"
field, and then parsing its sub-parts, let's use
find_commit_header and split_ident_line. This is shorter and
easier to read, and should do a more careful parsing job.

For example, the current parser could find the end-of-email
right-bracket across a newline (for a malformed commit), and
calculate a bogus gigantic length for the date (by using
"eol - rb").

As a bonus, this also plugs a memory leak when we pull the
date field from an existing commit (we still leak the name
and email buffers, which will be fixed in a later commit).

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-08-29 10:33:28 -07:00
a872275098 teach fast-export an --anonymize option
Sometimes users want to report a bug they experience on
their repository, but they are not at liberty to share the
contents of the repository. It would be useful if they could
produce a repository that has a similar shape to its history
and tree, but without leaking any information. This
"anonymized" repository could then be shared with developers
(assuming it still replicates the original problem).

This patch implements an "--anonymize" option to
fast-export, which generates a stream that can recreate such
a repository. Producing a single stream makes it easy for
the caller to verify that they are not leaking any useful
information. You can get an overview of what will be shared
by running a command like:

  git fast-export --anonymize --all |
  perl -pe 's/\d+/X/g' |
  sort -u |
  less

which will show every unique line we generate, modulo any
numbers (each anonymized token is assigned a number, like
"User 0", and we replace it consistently in the output).

In addition to anonymizing, this produces test cases that
are relatively small (compared to the original repository)
and fast to generate (compared to using filter-branch, or
modifying the output of fast-export yourself). Here are
numbers for git.git:

  $ time git fast-export --anonymize --all \
         --tag-of-filtered-object=drop >output
  real    0m2.883s
  user    0m2.828s
  sys     0m0.052s

  $ gzip output
  $ ls -lh output.gz | awk '{print $5}'
  2.9M

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-08-27 10:42:16 -07:00
c33ddc2e33 date: use strbufs in date-formatting functions
Many of the date functions write into fixed-size buffers.
This is a minor pain, as we have to take special
precautions, and frequently end up copying the result into a
strbuf or heap-allocated buffer anyway (for which we
sometimes use strcpy!).

Let's instead teach parse_date, datestamp, etc to write to a
strbuf. The obvious downside is that we might need to
perform a heap allocation where we otherwise would not need
to. However, it turns out that the only two new allocations
required are:

  1. In test-date.c, where we don't care about efficiency.

  2. In determine_author_info, which is not performance
     critical (and where the use of a strbuf will help later
     refactoring).

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-08-27 10:32:56 -07:00