In the first part of this article series we looked into the objects that make up Git’s datastore (blobs, trees and commits). We also saw how commits participate in a directed acyclic graph (or DAG).
In this second, and final part of this series we examine a few Git commands and see how they work with and manipulate the DAG.
This is an article I wrote for NFJS, The Magazine's October, 2014 issue. This is a 2-part series, this being the second one. You can find the first part here
Initial set up
If you have been playing along since we last met then you can continue to use the gitsGuts
repository we set up.
Otherwise let us quickly initialize a new repository with a few commits so we have a basis to work with.
$ git init gitsGuts (1)
# Initialized empty Git repository in /Users/looselytyped/Documents/articles/gitsGuts/.git/
$ cd gitsGuts (2)
$ (master) echo 'Hello Git!' > README.md (3)
$ (master) mkdir src
$ (master) echo '// This is my source code' > src/Main.java (4)
$ (master) cd ..
$ (master) git add .
$ (master) git commit -m "Initial commit" (5)
# [master (root-commit) 3cf00f8] Initial commit
# 2 files changed, 2 insertions(+)
# create mode 100644 README.md
# create mode 100644 src/Main.java
$ (master) echo 'Making another commit' >> README.md (6)
$ (master) git add README.md
$ (master) git ci -m "Second commit" (7)
# [master aed7e05] Second commit
# 1 file changed, 1 insertion(+)
....
<1> Initialize a new respository
<2> Be sure to cd into it!
<3> Initialize a README me file with some text
<4> Initialize another plain text file inside the src sub-directory
<5> git-add and git-commit both the files
<6> Edit the README file by appending some text
<7> Make a second commit
We now have a Git repository with 2 files and 2 commits. Just to be sure we are on the same page let us inspect directory structure using the tree
command.
I also display an abbreviated version of the Git log. [1]
$ (master) tree (1)
# .
# ├── README.md
# └── src
# └── Main.java
# 1 directory, 2 files
$ (master) git lg (2)
# * aed7e05 - (HEAD, master) Second commit <Raju Gandhi>
# * 3cf00f8 - Initial commit <Raju Gandhi>
....
<1> Display the structure of the repository
<2> An abbreviated Git log
Bear in mind that the hashes of your commits will be different than those you see in my log.
The latest commit in my repository happens to be aed7e05
— be sure to remember yours.
Looking good? Then let us talk about branching.
git-branch
If you have used Git for any amount of time then you are most certainly used to and are most likely an ardent proponent of branching. You have probably been even told or heard that branching in Git is really cheap. So how does branching in Git really work?
One way to think about branches in Git is to think of them as sticky notes. You can visualize these sticky notes to have two lines of text in them — the first line contains the name of the branch, written by a thick permanent marker. The second line on the sticky note happens to be written using a pencil and is the hash of the last commit on that branch.
Let us start by examining the .git/refs/heads
directory, and then create a new branch using git-branch
, and inspect that directory once again.
$ (master) tree .git/refs/heads (1)
# .git/refs/heads
# └── master
#
# 0 directories, 1 file
$ (master) cat .git/refs/heads/master (2)
# aed7e05f8b3fc115c1c2507c79454c002383e9ee
$ (master) git branch featureBranch (3)
$ (master) tree .git/refs/heads (4)
# .git/refs/heads
# ├── featureBranch
# └── master
#
# 0 directories, 2 files
$ (master) cat .git/refs/heads/featureBranch (5)
# aed7e05f8b3fc115c1c2507c79454c002383e9ee
....
<1> List the files under the .git/refs/heads
<2> Inspect the contents of the master file
<3> Create a new branch using git-branch
<4> List the files under .git/refs/heads again to see a new file
<5> Display the contents of the newly created file
Recall that by default Git creates a master
branch for our repository.
Listing the files under the .git/refs/heads
directory reveals a file with exactly that name.
Furthermore, .git/refs/heads/master
happens to be a plain text file that contains exactly one line of text — which is the hash of the latest commit on the master
branch.
We then create a new branch using the git-branch
command supplying it with the name of the branch.
Inspecting .git/refs/heads
once again reveals that a new file now resides beneath it — and the name of the file just so happens to be the name of the newly created branch.
Inspecting the contents of .git/refs/heads/featureBranch
tells us that it too contains the same hash as the master
file — or in other words the hash of the latest commit on that branch.
In this illustration I have added how branches play into the DAG. You will notice that this is a slightly different version of the illustration that we saw in Part I of this series — in that I have stripped out the trees that the commits point to, and correspondingly sub-trees and blobs. This will allow us to focus on the DAG.
As you can see Git branches are simply pointers, or references — that point to commit object using their hashes. Each branch has two parts to it — the name of the branch, and the commit it points to.
Now, what did I mean earlier when I spoke of permanent markers and pencils and sticky notes? It turns out that we can answer that question simply by making another commit. Let us do that, shall we?
$ (master) git status (1)
# On branch master
# nothing to commit, working directory clean
$ (master) echo 'Making a third commit on master' >> README.md (2)
$ (master) git add README.md
$ (master) git commit -m "Third commit" (3)
# [master a509575] Third commit
# 1 file changed, 1 insertion(+)
$ (master) git lg (4)
# * a509575 - (HEAD, master) Third commit <Raju Gandhi>
# * aed7e05 - (featureBranch) Second commit <Raju Gandhi>
# * 3cf00f8 - Initial commit <Raju Gandhi>
....
<1> Git status tells us we are on the master branch and the working directory is clean
<2> Make an edit
<3> Commit the edit
<4> Display the abbreviated git log
The abbreviated Git log tells us that the master
branch is one commit ahead of featureBranch
.
If you recall our discussion from Part I of this series article you know that when we made our latest commit Git created a new commit object.
This commit has a calculated hash of a509575
(or a509575203205931cbcfc5a21d11c395ffbdced4
to be precise) and has a pointer to its parent commit which happens to be aed7e05
.
Git also took the sticky note with master
on it, erased the hash that was previously written on it and replaced it with a509575203205931cbcfc5a21d11c395ffbdced4
.
You can verify this by simply cat
-ing .git/refs/heads/master
and .git/refs/heads/featureBranch
$ (master) cat .git/refs/heads/master
# a509575203205931cbcfc5a21d11c395ffbdced4
$ (master) cat .git/refs/heads/featureBranch
# aed7e05f8b3fc115c1c2507c79454c002383e9ee
You can visualize the net effect in the following illustration.
master
It really is that simple! Git simply adds to the DAG just as we expected it to, and updates the appropriate references (written in pencil). The name of the branch needs no updating, hence in our analogy the name can be seen as written with a permanent marker. [2]
You should also note that the commit has no knowledge of any of the references that point to it — that information is maintained outside the DAG.
Quiz time — can you visualize what were to happen if I checked out featureBranch
and made a commit on that branch?
Git creates a new commit with aed7e05f8b3fc115c1c2507c79454c002383e9ee
as the parent, then updates the featureBranch
sticky note with the hash of the latest commit on that branch.
Take a look.
featureBranch
You see how the code diverges away from master
.
What if we were to delete a branch, say master
using git branch -D master
? [3]
Git simply takes the sticky note with master
on it, crumples it and throws it away!
On inspecting the .git/refs/heads
directory you will see that the master
file has indeed been deleted.
You might wonder about the commit that master
was referencing prior to being deleted.
In our particular scenario you can see that if the master
sticky note disappears there is nothing referencing the latest commit on that branch.
Git will eventually [4] throw that commit away as well.
Note that all other commit objects in the DAG have a reference to it — that could be a sticky note or child commit treating it as its parent.
As long as a commit object has a hard reference to it, Git will keep it around, else it will be garbage collected.
master
In this section we saw how git-branch
affects the DAG, and how operations like git-commit
and deleting branches affect the DAG.
One thing you might have been wondering about all along is — how does Git know which branch to work on? Let us take a look, shall we?
git-checkout
Whenever we wish to work on a particular branch in Git we have to check it out. What does this mean in terms of the DAG, and is there more to it than meets the eye?
Our leading character for this section is the HEAD
file that resides directly beneath the .git
directory.
Let us start by inspecting the HEAD
file, then checkout (or switch) branches and see what happens. (Please note that if you have been following along on the terminal you should have featureBranch
checked out and we will need to create another branch just so we can switch to it since we deleted master
)
$ (featureBranch) git branch master (1)
$ (featureBranch) cat .git/HEAD (2)
# ref: refs/heads/featureBranch
$ (featureBranch) git checkout master (3)
# Switched to branch 'master'
$ (master) cat .git/HEAD (4)
# ref: refs/heads/master
....
<1> Recreate master
<2> List the contents of .git/HEAD
<3> Switch branches
<4> Inspect .git/HEAD again
First things first — the .git/HEAD
file tells Git what the HEAD
currently points to.
Furthermore, it turns out that the HEAD
file, unlike the refs
files does not seem to contain a hash.
Rather, it seems to point to a reference!
Another way to think about this is that the HEAD
is a symbolic reference, in that it does not directly point to a hash, rather it points to the reference that represents the currently checked out commit.
You can visualize how the HEAD
works as shown here (I have truncated the diagram for brevity)
featureBranch
is checked outAfter checking out master
this is how the DAG would look
master
is checked outAs you can see, whatever HEAD
points to represents what is “checked” out.
But there is more to that than meets the eye.
The most important thing to bear in the mind about the HEAD
is that the HEAD
will always represent the parent of the next commit.
There is no exception to this rule.
Knowing this, can you see how making a commit now will work?
Git will kick off all the machinery that is needed to calculate the hashes of the blobs, trees, and finally the commit. It will use the commit that HEAD
points to, and make that commit the parent of the next commit.
Now that the commit is a member of the DAG, Git will simply rewrite the master
sticky note with the hash of the new commit.
Does the HEAD
need updating?
No!
It continues to point to the master
reference.
Knowing that the HEAD
will always be the parent of the next commit has a few implications.
If you have ever committed on the wrong branch then it was because you lost track of your HEAD
(pun intended).
Liberal use of git-status
is a good way to avoid the aforementioned problem.
An alternative is to combine the use of git-prompt.sh along with some bash
prompt trickery to always have the branch you have checked out visible when working at the terminal.
There is yet another powerful, and often nerve-racking (especially for newcomers to Git) facet to the HEAD
.
For a minute let us consider what happens when we git-checkout
a branch.
Git looks in the .git/refs/heads
directory to find the file that matches the name of the branch we wish to check out and identifies the hash that that branch currently points to.
It then looks in the .git/refs/objects
directory and finds the commit object that the hash represents and “unfolds” it — in that it finds the tree the commit points to and recreates the working directory as represented by that tree object.
Finally, it rewrites .git/HEAD
file to symbolically point to the newly checked out branch.
If you were to boil down the git-checkout
lookup algorithm to its essence you could think of Git as checking out a hash!
We are programmers, and now we are curious — what if we were to checkout a hash? What happens? Let us find out, shall we?
$ (master) git lg (1)
# * 40ee28b - (HEAD, master, featureBranch) Some commit <Raju Gandhi>
# * aed7e05 - Second commit <Raju Gandhi>
# * 3cf00f8 - Initial commit <Raju Gandhi>
$ (master) git checkout aed7e05 (2)
# Note: checking out 'aed7e05'.
#
# You are in 'detached HEAD' state. You can look around, make experimental
# changes and commit them, and you can discard any commits you make in this
# state without impacting any branches by performing another checkout.
#
# If you want to create a new branch to retain commits you create, you may
# do so (now or later) by using -b with the checkout command again. Example:
#
# git checkout -b new_branch_name
#
# HEAD is now at aed7e05... Second commit
....
<1> Abbreviated git log
<2> Pick the second commit the check it out
We start by looking at the log (just so we can pick a commit hash at random) and then proceed to check it out. Git informs us that we are in detached HEAD state — we will see what that means in a minute.
Before we proceed I want you to read the warning that Git emitted when we checked out aed7e05
.
Done?
Moving on then …
First things first, what does HEAD
point to?
That one is easy — we can simply cat
.git/HEAD
$ ((aed7e05...)) cat .git/HEAD
# aed7e05f8b3fc115c1c2507c79454c002383e9ee
Aha! Now .git/HEAD
points directly to a hash instead of symbolically pointing to one via a reference.
Let us attempt to visualize how this looks.
As you can see HEAD
now points to a commit directly.
Knowing this, and that HEAD
will always point to the parent of the next commit, can you visualize what were to happen if were to make a commit at this point?
Let us quickly make a commit, and then lay out the DAG so we can conceptualize how the DAG changed.
$ ((aed7e05...)) echo 'In Detached HEAD state' >> README.md (1)
$ ((aed7e05...)) git add README.md (2)
$ ((aed7e05...)) git commit -m "Making a commit in detached HEAD state" (3)
# [detached HEAD ff21829] Making a commit in detached HEAD state
# 1 file changed, 1 insertion(+)
$ ((ff21829...)) git lg (4)
# * ff21829 - (HEAD) Making a commit in detached HEAD state <Raju Gandhi>
# | * 40ee28b - (master, featureBranch) Some commit <Raju Gandhi>
# |/
# * aed7e05 - Second commit <Raju Gandhi>
# * 3cf00f8 - Initial commit <Raju Gandhi>
....
<1> Make an edit
<2> Add the file to the index
<3> Make a commit
<4> Git log
We see we have a new commit (ff21829
) that HEAD
now points to.
Any ideas on the DAG?
We are one step away from truly understanding what the “detached” in detached HEAD means.
Answer this question — what happens if were to git-checkout
master
or featureBranch
(or for that matter any other commit?)
If we are to checkout another commit then the HEAD
would directly or indirectly point to that commit — and leave ff21829
behind!
Who points to ff21829
then?
No one!
Which means that when Git’s garbage collector comes around (and it will) our newly created commit will disappear.
Another way to think about detached HEAD state is to think of it as being on an anonymous branch.
See, when we have the HEAD
pointing directly to a commit Git continues to behave like if were working with a “named” branch — except when we check something else out.
At that point there is a small chance that if we are not careful the commit that HEAD
was pointing to may not have anything else pointing to it.
And we know what happens to commits that have no hard references to them, yes?
What are the chances that we will leave a commit behind?
Let us check out master
right now and see what happens.
$ ((ff21829...)) git checkout master # Warning: you are leaving 1 commit behind, not connected to # any of your branches: # # ff21829 Making a commit in detached HEAD state # # If you want to keep them by creating a new branch, this may be a good time # to do so with: # # git branch new_branch_name ff21829 # # Switched to branch 'master'
Git ever so nicely warns us that we are indeed leaving ff21829
behind, and if we do wish to keep it around it may serve us well to create a new branch.
It even tells us how to go about doing it.
In essence Git is telling us to create a sticky note as a reminder of the commits hash!
Keeping track of the HEAD
when working in Git is essential since it dictates where our changes will eventually end up in the DAG.
However, Git allowing us to move the HEAD
to any arbitrary commit allows us to be playful — we can checkout any other state of our repository for quick and dirty experimentation or debugging.
If we like what we see we can simply create a new branch and keep our changes around a little bit longer, or simply checkout some other commit and be on our merry way knowing that Git’s garbage collector will come around and clean up our mess for us.
Conclusion
Reiterating what I said about Git at the end of Part I — Git’s power comes from simplicity. The DAG represents the fundamental datastructure that Git uses to store our repository’s history — and all commands that we love and use in Git affect that DAG. We now understand how the DAG is built, and we understand how a few commands operate on that DAG.
Take a look at any of Gits man-pages for git-merge
, git-rebase
or what-have-you — you will see references to the DAG everywhere.
Where do we go from here? I suggest the next time you issue start to work with Git you keep a mental picture of the DAG in your mind’s eye. The next time you are about to issue a command to Git attempt to visualize what the DAG will look like after the command executes, then attempt to find out [5] if you got it right.
Till next time, May the DAG be with you.
git log --graph --all --full-history --color --pretty=format:'%x1b[31m%h%x09%x1b[32m %C(white)- %d%x1b[0m%x20%s %C(bold blue)<%an>%Creset'
. I have the same aliased to git lg
-D
(uppercase) flag in this case since Git will complain of master
not being fully merged