Git not converting line endings to LF on commit


Even when setting core.autocrlf=true we are still seeing line endings committed as CRLF instead of LF (I can see the ^M symbol when running git diff and git log -p)

This is causing merge conflicts sometimes as different developers use different settings in their editors.

How can we fix this with minimal future conflicts in a very active repository environment?


I generally recommend using .gitattributes here, rather than setting core.autocrlf (not that I actually deal with this, but that’s how the Git project folks do it, and presumably they know).

.gitattributes example:

*.ts        text eol=lf

That’s not going to fix your problems, but it should help avoid new problems in the future.

To handle merge issues, consider running:

git merge -X renormalize

and/or setting merge.renormalize to true.

Long: why

It’s worth pointing out that Git never does any conversions on commit. The mechanism that implements CRLF-to-LF-only conversion sits not in git commit, but rather in git add. To see why, we must start with a few Git basics:

  • Each commit has a full snapshot of every file (that Git knew about at the time you, or whoever, made the commit). (Each commit also has some metadata, but this is not relevant to this particular issue.)
  • No commit, once made, can ever be changed. Once some file is in a commit, it’s inviolate.
  • The files stored inside a commit are not stored as ordinary files. Instead, they’re stored in a special, read-only, Git-only, compressed (sometimes highly compressed) and de-duplicated form. The de-duplication takes care of the fact that most commits mostly use the same copies of files as some other earlier commit. But this also means that you literally can’t work on / with these copies of the files.
  • The snapshot for a new commit comes not from your working tree copies of files, but rather from Git’s "copies" of files as they appear in GIt’s index.

The index is also called the staging area, which is a better name in terms of how users use it, although the ways Git uses it go beyond this (which is why it has the name cache as well, giving it three names: cache, index, and staging area). In any case, these extra "copies" of files exist, in Git’s index. I put "copies" in quotes here because what’s in the index is already in the special form Git uses. These are not ordinary files and cannot be edited either (they can be replaced though).

Instead, there’s a third copy of every file. This third copy is an ordinary file. These are the files that you can see and edit. The catch is that these files are not in Git. They are extracted from Git during git checkout or git switch, or when using git reset or git restore for instance.

During extraction, Git has two options: it can leave the file alone, or it can change it. The change can include replacing LF-only line endings with CRLF line endings. You now have a file you can look at. If you choose to have files extracted such that they have CRLF line endings, you also have configured Git, at this point, to make "undo CRLF line endings" changes if and when you have Git replace the index copy.

What git add does is tell Git: Make the index copy match the working tree copy. If you’ve changed the working tree copy Git will now compress and Git-ify the working tree copy and use that to replace the copy that is in the index. Unfortunately for CRLF-line-ending-fixing, if Git thinks that there’s no need to replace the index copy, Git does nothing at all.1

Unfortunately, Git checks neither core.autocrlf nor any settings in .gitattributes when deciding whether or not it needs to replace the index copy of some file. So changing either of these does not count as a "change" to any file.2 In Git 2.16 and later, git add --renormalize helps tell Git: Hey, you dummy, I changed my EOL conversions, so adding a file changes it even if I haven’t changed it. In Git versions predating 2.16, you must trick Git into believing that you changed the file if all you’re trying to do is fix the line endings. (There are a bunch of ways to do that, but let’s just pretend you have git add --renormalize. 😀)

Think of Git’s index as holding the proposed next commit. When you run git commit, Git simply snapshots the index. Since it already holds Git-ified files, this goes pretty fast.

In any case, the end result here is this:

  • At the time Git copies a file from Git’s index—or with git restore, from a commit—to your working tree, Git applies the "make the file useful to you, personally" end-of-line changes.

  • At the time Git copies a file from your working tree to Git’s index, Git applies the "make the file normalized for the repository" changes.

When you run git commit, Git uses the copy that is in Git’s index to make the commit. So whatever line endings appear in the index copy, those are the line endings that go into the permanent copy in a new commit. You can’t see those copies, but git ls-files can.

1Actually re-compressing a file into the Git format is a lot of work, so Git cleverly avoids it whenever possible. That’s one of the reasons that Git’s index exists. Other version control systems get by without one; other version control systems are slower. The problem here is just that Git thinks that this work-avoidance is possible too often.

2Of course, if you modify .gitattributes, Git will realize that .gitattributes is modified. It’s just that it never extends that to thinking: Oh hey! Maybe that means other files have EOL settings changed.

core.autocrlf vs .gitattributes

There are a bunch of differences between using core.autocrlf and using .gitattributes to specify in-repository line-ending formats. The biggest and most obvious is of course that core.autocrlf is a setting everyone has to make in their personal .git/config or $HOME/.gitconfig or wherever they like to put it, but .gitattributes is a committed file.

Being a committed file has a bunch of ramifications: in particular, you get the existing one when you check out some commit. There’s a copy of that file in each commit—well, each commit made when .gitattributes was in Git’s index at the time whoever made the commit, made the commit—and when you check out that commit, Git obeys that commit’s .gitattributes settings. When you git add files, Git obeys your working tree’s .gitattributes settings, so you can change the settings and git add files—including .gitattributes—and your updated settings will apply and will go into the next commit.

Importantly, when listing files in .gitattributes, you are in control. You can tell Git that file xyz is a binary file, if it is binary. You can tell Git that file abc is a text file, if it is text. You can say that *.js or *.py are text and that *.jpg are binary. When you use core.autocrlf, you’re just having Git guess. Git can guess wrong, and do CRLF changes to your binary files, or not do them to your text files.

For details on what to put in a .gitattributes file, see the gitattributes documentation.

The renormalize option

When you use git merge to do a three-way merge,3 there are three input commits: the current commit, the commit you select via the command line, and the merge base. You can normalize line endings in your current commit, by just making a new commit with the fixed line endings. If you really had to, you could check out the commit you are about to select on the command line, normalize its line endings, and commit that too, and use that one on the command line instead. But you literally can’t fix the merge-base commit: Git finds this one on its own, based on the commit graph—the linkage between commits as stored in the commits’ metadata.

Line endings matter to git diff, and the merge depends on the two diffs (from the merge base to your commit, and from the merge base to the commit you select). So it is sometimes necessary to fix the merge base, as well as the current commit and the other commit. The renormalize option does exactly that. This way, there’s no need to do the impossible—to fix historical commits’ line endings. A merge that requires renormalizing goes a little slower, but that’s better than not going at all.

3Remember that git merge might instead do a fast-forward merge, which is not a merge at all.

Answered By – torek

This Answer collected from stackoverflow, is licensed under cc by-sa 2.5 , cc by-sa 3.0 and cc by-sa 4.0

Leave a Reply

(*) Required, Your email will not be published