How does git track file changes internally?

Issue

Could somebody explain how git knows internally that files X, Y and Z have changed? What is the process behind the scenes that recognizes when a file has not yet been added or has modifications? I am asking because, with Subversion it’s simple to figure out that it keeps track of these things by having a .svn directory under each folder, but for git I can’t seem to find a description of the inner workings of this. I doubt it scans through all the sub-directories for changes, as it’s quite fast.

So, out if curiosity, what are it’s inner workings?

Solution

The mechanisms by which one determines the status of a file is fairly straightforward. To know what files have been staged, one simply diffs the HEAD tree with the index. Any items that appear only in the index have been staged for addition, any items that appear only in HEAD have been removed and any items that are different have had changes staged.

Similarly, one would detect unstaged changes by diff’ing the index with the working directory.

Your question in particular asks how this can be so fast (after all, computing the SHA1 hash of a file is not exactly speedy.) This is where the index – also known as the cache – comes in to play again. The index also has fields for the file size and file modification time. Thus one can simply stat(2) a file on disk and compare against the index’s file size and file modification time to know whether to hash the file or not.

Answered By – Edward Thomson

This Answer collected from stackoverflow, is licensed under cc by-sa 2.5 , cc by-sa 3.0 and cc by-sa 4.0

Leave a Reply

(*) Required, Your email will not be published