We tried to speed up the CI build of one of our software projects at work. Somebody committed some huge (by git’s standards) binaries early in the project’s life. Rewriting git’s history just to get rid of them seems like too much trouble, so we figured doing a shallow clone that avoided those big early commits would be good enough.
I did some experiments with
--depth parameter for clone and encountered some weird behavior. This is what help for git clone says about it:
--depth <depth> Create a shallow clone with a history truncated to the specified number of commits. Implies --single-branch unless --no-single-branch is given to fetch the histories near the tips of all branches. If you want to clone submodules shallowly, also pass --shallow-submodules.
This would indicate that
<depth> will equal the number of commits that will be fetched during the clone, but it’s not the case. This is what I got when I tried different values for depth:
| depth | commit count linux repo | commit count git repo | |---------|-------------------------|-----------------------| | 1 | 1 | 1 | | 5 | 15 | 13 | | 10 | 80 | 46 | | 100 | 93133 | 39552 | | 1000 | 788718 | 53880 |
For cloning I used this command
git clone --depth 10 https://github.com/torvalds/linux.git,
git clone --depth 100 https://github.com/git/git.git, and for counting the commits I used this
git log --oneline | wc -l. (At work I observed the same thing with a GitLab server, so it can’t be an artifact of how GitHub works.)
Does anybody know what is going on? How does the value for depth correspond to the actual amount of data downloaded? Do I understand the documentation wrongly, or is there a bug?
EDIT: I added results for a second repo
As Jonathon Reinhart commented, you’re seeing the effect of merges.
--depth parameter refers to how deep Git goes on a “walk” from each starting point. As the documentation you quoted mentions, it also implies
--single-branch, which simplifies talking about this. The important point here is that the walk visits all parents of each commit, which—for each depth level—is more than one commit if the commit itself is a merge.
Suppose we have a commit graph that looks like this:
$ git log --graph --oneline master * cf68824 profile: fix PATH with GOPATH * 7c2376b profile: add Ruby gem support * 95c8270 profile: set GOPATH * 26a9cc3 vimrc: fiddle with netrw directory display * 80b88a5 add ruby gems directory to path [snip]
Here, each commit has just one parent. If we use
--depth 3 we’ll pick up the tip commit
cf68824, its parent
7c2376b at depth 2, and finally
95c8270 at depth 3—and then we stop, with three commits.
With the Git repository for Git, however:
$ git log --graph --oneline master * 965798d1f2 Merge branch 'es/format-patch-range-diff-fix-fix' |\ | * ac0edf1f46 range-diff: always pass at least minimal diff options * | 5335669531 Merge branch 'en/rebase-consistency' |\ \ | * | 6fcbad87d4 rebase docs: fix incorrect format of the section Behavioral Differences * | | 7e75a63d74 RelNotes 2.20: drop spurious double quote * | | 7a49e44465 RelNotes 2.20: clarify sentence [snip]
--depth 3, we start with
965798d1f2, then—for depth 2—pick up both parents,
5335669531. To add the depth-3 commits, we pick up all the parents of those two commits. The (lone) parent of
ac0edf1f46 is not visible here, while the two parents of
5335669531 are (namely
7e75a63d74). To get the hash IDs of the parents of
ac0edf1f46 we can use:
$ git rev-parse ac0edf1f46^@ d8981c3f885ceaddfec0e545b0f995b96e5ec58f
so that gives us our six commits: the tip of master (which is currently a merge commit), two parents of that commit, one parent of one of those parents, and two parents of the other of that parent.
Depending on precisely when you ran the clone of Git, the tip-most
master is often not a merge, but often has a merge as its immediate parent, so that
--depth 2 will often get you 3 commits, and
--depth 3 will therefore get at least 5, depending on whether the two parents of the tip of
master are themselves merges.
(Compare the above
git rev-parse output with:
$ git rev-parse 965798d1f2^@ 5335669531d83d7d6c905bcfca9b5f8e182dc4d4 ac0edf1f46fcf9b9f6f1156e555bdf740cd56c5f
for instance. The
^@ suffix means all parents of the commit, but not the commit itself.)
Answered By – torek