The fsck code operates on an object buffer represented as a pointer/len combination. However, the parsing of commits and tags is a little bit loose; we mostly scan left-to-right through the buffer, without checking whether we've gone past the length we were given. This has traditionally been OK because the buffers we feed to fsck always have an extra NUL after the end of the object content, which ends any left-to-right scan. That has always been true for objects we read from the odb, and we made it true for incoming index-pack/unpack-objects checks ina1e920a0a7(index-pack: terminate object buffers with NUL, 2014-12-08). However, we recently added an exception: hash-object asks index_fd() to do fsck checks. That _may_ have an extra NUL (if we read from a pipe into a strbuf), but it might not (if we read the contents from the file). Nor can we just teach it to always add a NUL. We may mmap the on-disk file, which will not have any extra bytes (if it's a multiple of the page size). Not to mention that this is a rather subtle assumption for the fsck code to make. Instead, let's make sure that the fsck parsers don't ever look past the size of the buffer they've been given. This _almost_ works already, thanks to earlier work in4d0d89755e(Make sure fsck_commit_buffer() does not run out of the buffer, 2014-09-11). The theory there is that we check up front whether we have the end of header double-newline separator. And then any left-to-right scanning we do is OK as long as it stops when it hits that boundary. However, we later softened that in84d18c0bcf(fsck: it is OK for a tag and a commit to lack the body, 2015-06-28), which allows the double-newline header to be missing, but does require that the header ends in a newline. That was OK back then, because of the NUL-termination guarantees (including the one froma1e920a0a7mentioned above). Because84d18c0bcfguarantees that any header line does end in a newline, we are still OK with most of the left-to-right scanning. We only need to take care after completing a line, to check that there is another line (and we didn't run out of buffer). Most of these checks are just need to check "buffer < buffer_end" (where buffer is advanced as we parse) before scanning for the next header line. But here are a few notes: - we don't technically need to check for remaining buffer before parsing the very first line ("tree" for a commit, or "object" for a tag), because verify_headers() rejects a totally empty buffer. But we'll do so in the name of consistency and defensiveness. - there are some calls to strchr('\n'). These are actually OK by the "the final header line must end in a newline" guarantee from verify_headers(). They will always find that rather than run off the end of the buffer. Curiously, they do check for a NULL return and complain, but I believe that condition can never be reached. However, I converted them to use memchr() with a proper size and retained the NULL checks. Using memchr() is not much longer and makes it more obvious what is going on. Likewise, retaining the NULL checks serves as a defensive measure in case my analysis is wrong. - commit9a1a3a4d4c(mktag: allow omitting the header/body \n separator, 2021-01-05), does check for the end-of-buffer condition, but does so with "!*buffer", relying explicitly on the NUL termination. We can accomplish the same thing with a pointer comparison. I also folded it into the follow-on conditional that checks the contents of the buffer, for consistency with the other checks. - fsck_ident() uses parse_timestamp(), which is based on strtoumax(). That function will happily skip past leading whitespace, including newlines, which makes it a risk. We can fix this by scanning to the first digit ourselves, and then using parse_timestamp() to do the actual numeric conversion. Note that as a side effect this fixes the fact that we missed zero-padded timestamps like "<email> 0123" (whereas we would complain about "<email> 0123"). I doubt anybody cares, but I mention it here for completeness. - fsck_tree() does not need any modifications. It relies on decode_tree_entry() to do the actual parsing, and that function checks both that there are enough bytes in the buffer to represent an entry, and that there is a NUL at the appropriate spot (one hash-length from the end; this may not be the NUL for the entry we are parsing, but we know that in the worst case, everything from our current position to that NUL is a filename, so we won't run out of bytes). In addition to fixing the code itself, we'd like to make sure our rather subtle assumptions are not violated in the future. So this patch does two more things: - add comments around verify_headers() documenting the link between what it checks and the memory safety of the callers. I don't expect this code to be modified frequently, but this may help somebody from accidentally breaking things. - add a thorough set of tests covering truncations at various key spots (e.g., for a "tree $oid" line, in the middle of the word "tree", right after it, after the space, in the middle of the $oid, and right at the end of the line. Most of these are fine already (it is only truncating right at the end of the line that is currently broken). And some of them are not even possible with the current code (we parse "tree " as a unit, so truncating before the space is equivalent). But I aimed here to consider the code a black box and look for any truncations that would be a problem for a left-to-right parser. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Git - fast, scalable, distributed revision control system
Git is a fast, scalable, distributed revision control system with an unusually rich command set that provides both high-level operations and full access to internals.
Git is an Open Source project covered by the GNU General Public License version 2 (some parts of it are under different licenses, compatible with the GPLv2). It was originally written by Linus Torvalds with help of a group of hackers around the net.
Please read the file INSTALL for installation instructions.
Many Git online resources are accessible from https://git-scm.com/ including full documentation and Git related tools.
See Documentation/gittutorial.txt to get started, then see
Documentation/giteveryday.txt for a useful minimum set of commands, and
Documentation/git-<commandname>.txt for documentation of each command.
If git has been correctly installed, then the tutorial can also be
read with man gittutorial or git help tutorial, and the
documentation of each command with man git-<commandname> or git help <commandname>.
CVS users may also want to read Documentation/gitcvs-migration.txt
(man gitcvs-migration or git help cvs-migration if git is
installed).
The user discussion and development of Git take place on the Git mailing list -- everyone is welcome to post bug reports, feature requests, comments and patches to git@vger.kernel.org (read Documentation/SubmittingPatches for instructions on patch submission and Documentation/CodingGuidelines).
Those wishing to help with error message, usage and informational message
string translations (localization l10) should see po/README.md
(a po file is a Portable Object file that holds the translations).
To subscribe to the list, send an email with just "subscribe git" in the body to majordomo@vger.kernel.org (not the Git list). The mailing list archives are available at https://lore.kernel.org/git/, http://marc.info/?l=git and other archival sites.
Issues which are security relevant should be disclosed privately to the Git Security mailing list git-security@googlegroups.com.
The maintainer frequently sends the "What's cooking" reports that list the current status of various development topics to the mailing list. The discussion following them give a good reference for project status, development direction and remaining tasks.
The name "git" was given by Linus Torvalds when he wrote the very first version. He described the tool as "the stupid content tracker" and the name as (depending on your mood):
- random three-letter combination that is pronounceable, and not actually used by any common UNIX command. The fact that it is a mispronunciation of "get" may or may not be relevant.
- stupid. contemptible and despicable. simple. Take your pick from the dictionary of slang.
- "global information tracker": you're in a good mood, and it actually works for you. Angels sing, and a light suddenly fills the room.
- "goddamn idiotic truckload of sh*t": when it breaks