A deeper understanding of git merge operations

Merging is a very common operation in Git: you can merge changes between branches, or perform pull and push operations on remote branches.

However, the git merge command can be a bit daunting for newcomers, because you can get different results from running merge in different situations. This uncertainty about the results has kept me from actively using it for a long time, and I’ve relied on visual interfaces like GitHub’s Pull Request or GitLab’s Merge Request to merge manually.

Today we’re going to take a look at merge.

Understanding Merge

In a version control system, a merge is a basic operation that consolidates the changes that have occurred in a group of files. Typically, when we use Git, we create different branches, and different people add and edit the same files.

The merge is usually done automatically by Git’s algorithm, but if there’s a conflict, such as a change to the same file in the same place, you’ll need to merge it manually.

Recursive Three-Way Merge Algorithm

Git uses the “Recursive Three-way Merge” algorithm when it automatically merges files, so let’s take a quick look at that. algorithm.

Let’s start with the “three-way merge” algorithm, assuming we have the following commit history.

Recursive Three-Way Merge Algorithm

In the diagram above, we merged the feature branch at master, so let’s backtrack the merge process.

At this point, master is pointing to commit C. Git first finds the only common ancestor of the two branches, commit A, and then compares the snapshots of the A, C, and F commits, which we’ll call the A, C, and F files. Next, Git compares the contents of the three files “line by line” and if two of the three files have the same line, it discards the line in file A and puts the line in the result file if it differs from file A.

Specifically, if the contents of A and C are the same, it means that this is the content changed in F and needs to be kept; if the contents of A and F are the same, the same; if the contents of C and F are the same, it means that both C and F have made the same change relative to A and needs to be kept. If A, C, and F are all the same, nothing happened; if C and F are not the same, there’s a conflict and we need to manually merge and select what we want to keep.

After you finish comparing, Git creates a new Merge commit with a snapshot of the final result file and points to it.

The three-way merge algorithm is based on finding the common ancestor of the files being merged, which works fine in some simple scenarios, but in the case of criss-cross merge, there is no unique nearest common ancestor, as shown below.

criss-cross merge

Now we need to merge the feature branch from the main branch, i.e. merge C7 into C8, and we find that C8 and C7 have two common ancestors, so what do we do? （If there is no unique common ancestor in the process of merging C3 and C5, the process will be performed recursively.

That’s all we need to know about the recursive three-way merge algorithm.

Merging conflicts

If you make different changes to the same part of the same file in two different branches, Git will not be able to merge them automatically, but will pause the merge process and wait for you to resolve the conflicts manually.

First we need to find the files that need to be resolved, and use git status to see the files that are unmerged because they contain merge conflicts.

$ git status
On branch master
You have unmerged paths.
  (fix conflicts and run "git commit")

Unmerged paths:
  (use "git add <file>..." to mark resolution)

    both modified:      main.py

no changes added to commit (use "git add" and/or "git commit -a")

Manual conflict resolution is similar to a two-for-one process, where Git adds special markers to conflicting files that look like the following.

<<<<<<< HEAD:main.py
print("Hello World")
=======
print("World Hello")
>>>>>>> feature:main.py

Split by =======, the upper part marked by <<<<<<< HEAD:main.py as the upper bound is the changes made by the current branch master, and the lower part marked by >>>>>>> feature:main.py as the lower bound is the different changes made to the same content by the feature to be merged. We need to edit the file to remove these marks and keep only what we need.

`1`	`print("Hello World")`

Of course, it is also possible to not select from it, but to replace it with a completely new paragraph.

After you have resolved the conflicts in all the files, you need to mark them as resolved by staging them with git add. Then you can run git commit to merge the commits. Git will add the resolved conflicts to the new Merge commit mentioned above.

Fast-forwarding Merges

There are times when you do a merge and then you don’t add a new Merge commit. This is called a fast-forward merge.

Suppose we created a feature branch based on master and added some new commits. Now we merge the changes from feature into the master branch.

$ git checkout master
$ git merge feature
Updating f42c576..3a0874c
Fast-forward
 main.py | 2 ++
 task.py | 3 ++
 worker.py | 1 ++
 3 file changed, 6 insertions(+)

The process is illustrated as follows.

Fast-forwarding Merges

The process is illustrated below because the commit D pointed to by the branch we want to merge, feature, is a direct successor to master, so Git will simply move the HEAD pointer forward. In other words, if you can follow one branch to the other, then Git will simply move the pointer forward (to the right) when merging the two, because there are no differences to resolve in this case - it’s called fast-forward.

Git’s Different Merge Strategies

When we use Git, we usually pull out a number of feature branches based on the main branch and merge them into the main branch when we’re done. There are different branch merging strategies.

Merge explicitly via merge.
implicit merge via rebase or fast-forward
squash post-implicit merge

Explicit merge via `merge`

This is the most common and straightforward way to merge, and is the default implementation for code hosting platforms like GitHub and GitLab.

Explicit merge vi1 merge

When we merge a feature branch into the master branch, Git does a recursive three-way merge of the two branches and creates a new Merge commit with the merge result. This Merge commit is essentially the same as a normal commit, but it has two parents.

$ git cat-file -p 44ba027
tree 5a1692ba62ef346b59e65e4aa441c731bebc51ff
parent 75bf5c59c2e7e493c98e026a415f16b8f0445e4a
parent bbbe6a4c02aa709299ac891779448daf8203df53
author xx <xx@xx.com> 1609141855 +0800
committer xx <xx@xx.com> 1609141855 +0800

Merge branch 'feature' into 'master'

We can see what merges have happened based on Merge commits very clearly in the commit history. On the other hand, a large number of Merge commits can make your commit history very divergent and even messy, and some developers or teams may want a more linear commit history that looks cleaner.

It’s important to note that by default Git doesn’t create separate Merge commits in the case of a fast-forward merge. If you want to create a Merge commit in all cases, you need to add the -no-ff option to the git merge command.

Implicit merge via `rebase` or `fast-forward`

We can replace merge with rebase for merging. I have described the principle and usage of rebase in detail in a previous article git-rebase: A Brief Analysis, simply put rebase finds the most recent ancestor commits for both branches and reapplies the changes to the current branch in order based on the target branch after the ancestor commit. Suppose we have two branches, master and feature, as shown below, and we perform the following actions.

$ git checkout feature
$ git rebase master
$ git checkout master
$ git merge feature

The process is shown in the following figure.

rebase&fast-forfward

We first merged master into feature using rebase, which gave us a fully linear feature branch with no additional merge commits, even though both branches had different commits.

Then we switch to the master branch and merge feature. On the feature branch after rebase, all commits are successor commits to master, so we will perform a fast-forward merge directly. A fast-forward merge will only occur if there are no commits newer than feature in the master branch (using rebase ensures this result), in which case HEAD of master can be moved right to the latest commit in the feature branch. This merge also does not create a separate Merge commit, it just quickly points the branch label to the new commit.

With a rebase or fast-forward implicit merge, we can get a neat linear commit history, but we also lose the contextual information that these commits used to have.

`squash` post-implicit merge

Another strategy for merging changes is to compress all feature branch commits into a single commit using the squash command in rebase interactive mode before performing a fast-forward merge or rebase. This further keeps the commit history of the master branch linear and tidy. It keeps a complete feature in a single commit, but it also loses the documentation and detail of the entire feature branch development process.

All three strategies have distinct advantages and disadvantages, and we can choose based on the specific scenario and our own needs.

Table of Contents

Understanding Merge

Recursive Three-Way Merge Algorithm

Merging conflicts

Fast-forwarding Merges

Git’s Different Merge Strategies

Explicit merge via merge

Implicit merge via rebase or fast-forward

squash post-implicit merge

Explicit merge via `merge`

Implicit merge via `rebase` or `fast-forward`

`squash` post-implicit merge