rerere: add documentation for conflict normalization
Add some documentation for the logic behind the conflict normalization in rerere. Helped-by: Junio C Hamano <gitster@pobox.com> Signed-off-by: Thomas Gummerer <t.gummerer@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
This commit is contained in:

committed by
Junio C Hamano

parent
2373b65059
commit
fb90dca34c
140
Documentation/technical/rerere.txt
Normal file
140
Documentation/technical/rerere.txt
Normal file
@ -0,0 +1,140 @@
|
|||||||
|
Rerere
|
||||||
|
======
|
||||||
|
|
||||||
|
This document describes the rerere logic.
|
||||||
|
|
||||||
|
Conflict normalization
|
||||||
|
----------------------
|
||||||
|
|
||||||
|
To ensure recorded conflict resolutions can be looked up in the rerere
|
||||||
|
database, even when branches are merged in a different order,
|
||||||
|
different branches are merged that result in the same conflict, or
|
||||||
|
when different conflict style settings are used, rerere normalizes the
|
||||||
|
conflicts before writing them to the rerere database.
|
||||||
|
|
||||||
|
Different conflict styles and branch names are normalized by stripping
|
||||||
|
the labels from the conflict markers, and removing the common ancestor
|
||||||
|
version from the `diff3` conflict style. Branches that are merged
|
||||||
|
in different order are normalized by sorting the conflict hunks. More
|
||||||
|
on each of those steps in the following sections.
|
||||||
|
|
||||||
|
Once these two normalization operations are applied, a conflict ID is
|
||||||
|
calculated based on the normalized conflict, which is later used by
|
||||||
|
rerere to look up the conflict in the rerere database.
|
||||||
|
|
||||||
|
Removing the common ancestor version
|
||||||
|
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||||
|
|
||||||
|
Say we have three branches AB, AC and AC2. The common ancestor of
|
||||||
|
these branches has a file with a line containing the string "A" (for
|
||||||
|
brevity this is called "line A" in the rest of the document). In
|
||||||
|
branch AB this line is changed to "B", in AC, this line is changed to
|
||||||
|
"C", and branch AC2 is forked off of AC, after the line was changed to
|
||||||
|
"C".
|
||||||
|
|
||||||
|
Forking a branch ABAC off of branch AB and then merging AC into it, we
|
||||||
|
get a conflict like the following:
|
||||||
|
|
||||||
|
<<<<<<< HEAD
|
||||||
|
B
|
||||||
|
=======
|
||||||
|
C
|
||||||
|
>>>>>>> AC
|
||||||
|
|
||||||
|
Doing the analogous with AC2 (forking a branch ABAC2 off of branch AB
|
||||||
|
and then merging branch AC2 into it), using the diff3 conflict style,
|
||||||
|
we get a conflict like the following:
|
||||||
|
|
||||||
|
<<<<<<< HEAD
|
||||||
|
B
|
||||||
|
||||||| merged common ancestors
|
||||||
|
A
|
||||||
|
=======
|
||||||
|
C
|
||||||
|
>>>>>>> AC2
|
||||||
|
|
||||||
|
By resolving this conflict, to leave line D, the user declares:
|
||||||
|
|
||||||
|
After examining what branches AB and AC did, I believe that making
|
||||||
|
line A into line D is the best thing to do that is compatible with
|
||||||
|
what AB and AC wanted to do.
|
||||||
|
|
||||||
|
As branch AC2 refers to the same commit as AC, the above implies that
|
||||||
|
this is also compatible what AB and AC2 wanted to do.
|
||||||
|
|
||||||
|
By extension, this means that rerere should recognize that the above
|
||||||
|
conflicts are the same. To do this, the labels on the conflict
|
||||||
|
markers are stripped, and the common ancestor version is removed. The above
|
||||||
|
examples would both result in the following normalized conflict:
|
||||||
|
|
||||||
|
<<<<<<<
|
||||||
|
B
|
||||||
|
=======
|
||||||
|
C
|
||||||
|
>>>>>>>
|
||||||
|
|
||||||
|
Sorting hunks
|
||||||
|
~~~~~~~~~~~~~
|
||||||
|
|
||||||
|
As before, lets imagine that a common ancestor had a file with line A
|
||||||
|
its early part, and line X in its late part. And then four branches
|
||||||
|
are forked that do these things:
|
||||||
|
|
||||||
|
- AB: changes A to B
|
||||||
|
- AC: changes A to C
|
||||||
|
- XY: changes X to Y
|
||||||
|
- XZ: changes X to Z
|
||||||
|
|
||||||
|
Now, forking a branch ABAC off of branch AB and then merging AC into
|
||||||
|
it, and forking a branch ACAB off of branch AC and then merging AB
|
||||||
|
into it, would yield the conflict in a different order. The former
|
||||||
|
would say "A became B or C, what now?" while the latter would say "A
|
||||||
|
became C or B, what now?"
|
||||||
|
|
||||||
|
As a reminder, the act of merging AC into ABAC and resolving the
|
||||||
|
conflict to leave line D means that the user declares:
|
||||||
|
|
||||||
|
After examining what branches AB and AC did, I believe that
|
||||||
|
making line A into line D is the best thing to do that is
|
||||||
|
compatible with what AB and AC wanted to do.
|
||||||
|
|
||||||
|
So the conflict we would see when merging AB into ACAB should be
|
||||||
|
resolved the same way---it is the resolution that is in line with that
|
||||||
|
declaration.
|
||||||
|
|
||||||
|
Imagine that similarly previously a branch XYXZ was forked from XY,
|
||||||
|
and XZ was merged into it, and resolved "X became Y or Z" into "X
|
||||||
|
became W".
|
||||||
|
|
||||||
|
Now, if a branch ABXY was forked from AB and then merged XY, then ABXY
|
||||||
|
would have line B in its early part and line Y in its later part.
|
||||||
|
Such a merge would be quite clean. We can construct 4 combinations
|
||||||
|
using these four branches ((AB, AC) x (XY, XZ)).
|
||||||
|
|
||||||
|
Merging ABXY and ACXZ would make "an early A became B or C, a late X
|
||||||
|
became Y or Z" conflict, while merging ACXY and ABXZ would make "an
|
||||||
|
early A became C or B, a late X became Y or Z". We can see there are
|
||||||
|
4 combinations of ("B or C", "C or B") x ("X or Y", "Y or X").
|
||||||
|
|
||||||
|
By sorting, the conflict is given its canonical name, namely, "an
|
||||||
|
early part became B or C, a late part becames X or Y", and whenever
|
||||||
|
any of these four patterns appear, and we can get to the same conflict
|
||||||
|
and resolution that we saw earlier.
|
||||||
|
|
||||||
|
Without the sorting, we'd have to somehow find a previous resolution
|
||||||
|
from combinatorial explosion.
|
||||||
|
|
||||||
|
Conflict ID calculation
|
||||||
|
~~~~~~~~~~~~~~~~~~~~~~~
|
||||||
|
|
||||||
|
Once the conflict normalization is done, the conflict ID is calculated
|
||||||
|
as the sha1 hash of the conflict hunks appended to each other,
|
||||||
|
separated by <NUL> characters. The conflict markers are stripped out
|
||||||
|
before the sha1 is calculated. So in the example above, where we
|
||||||
|
merge branch AC which changes line A to line C, into branch AB, which
|
||||||
|
changes line A to line C, the conflict ID would be
|
||||||
|
SHA1('B<NUL>C<NUL>').
|
||||||
|
|
||||||
|
If there are multiple conflicts in one file, the sha1 is calculated
|
||||||
|
the same way with all hunks appended to each other, in the order in
|
||||||
|
which they appear in the file, separated by a <NUL> character.
|
4
rerere.c
4
rerere.c
@ -394,10 +394,6 @@ static int is_cmarker(char *buf, int marker_char, int marker_size)
|
|||||||
* and NUL concatenated together.
|
* and NUL concatenated together.
|
||||||
*
|
*
|
||||||
* Return the number of conflict hunks found.
|
* Return the number of conflict hunks found.
|
||||||
*
|
|
||||||
* NEEDSWORK: the logic and theory of operation behind this conflict
|
|
||||||
* normalization may deserve to be documented somewhere, perhaps in
|
|
||||||
* Documentation/technical/rerere.txt.
|
|
||||||
*/
|
*/
|
||||||
static int handle_path(unsigned char *sha1, struct rerere_io *io, int marker_size)
|
static int handle_path(unsigned char *sha1, struct rerere_io *io, int marker_size)
|
||||||
{
|
{
|
||||||
|
Reference in New Issue
Block a user