42 KiB
42 KiB
Git Tools
Purpose / Context
- You already know day-to-day Git workflows
- track + commit files
- staging area
- topic branching + merging
- This chapter: powerful/advanced tools you might not use every day, but will eventually need
Revision Selection
- Git can refer to:
- a single commit
- a set of commits
- a range of commits
- References can be:
- hashes (full/short)
- branch names
- reflog entries
- ancestry expressions
- range expressions
Single Revisions
- Full SHA-1
- 40-character commit hash (e.g., from
git log)
- 40-character commit hash (e.g., from
- Short SHA-1 (abbreviated hash)
- Git accepts a prefix of the SHA-1 if:
- at least 4 characters
- unambiguous among all objects in the object database
- Inspect a commit (examples; any unique prefix works)
git show <full_sha>git show <shorter_unique_prefix>
- Generate abbreviated commits in log output
git log --abbrev-commit --pretty=oneline- defaults to 7 characters; lengthens as needed to remain unique
- Practical uniqueness
- often 8–10 chars enough within a repo
- example note: very large repos still have unique prefixes (Linux kernel cited)
- Git accepts a prefix of the SHA-1 if:
- Note: SHA-1 collision concerns (and Git’s direction)
- SHA-1 digest: 20 bytes / 160 bits
- Random collisions are astronomically unlikely
- 50% collision probability requires about 2^80 randomly-hashed objects
- probability formula cited:
p = (n(n-1)/2) * (1/2^160)
- If a collision happened organically:
- Git would reuse the first object with that hash (you’d always get first object’s data)
- Deliberate, synthesized collisions are possible (e.g., shattered.io, Feb 2017)
- Git is moving toward SHA-256 as the default hash algorithm
- more resilient to collision attacks
- mitigation code exists, but cannot fully eliminate attacks
- Branch References
- If a commit is the tip of a branch, you can refer to it by branch name
git show <branch>- equivalent to
git show <sha_of_branch_tip>
- Plumbing tool to resolve refs → SHA-1:
git rev-parse- example:
git rev-parse topic1 - purpose: lower-level operations (not typical day-to-day), but useful for “what is this ref really?”
- example:
- If a commit is the tip of a branch, you can refer to it by branch name
- Reflog Shortnames
- Git records a reflog (local history of where HEAD/refs have pointed)
- View reflog
git reflog- shows entries like
HEAD@{0},HEAD@{1}, …
- Refer to older values
git show HEAD@{5}(the 5th prior HEAD value in reflog)
- Time-based reflog syntax
git show master@{yesterday}
- Log-format reflog output
git log -g <branch>(e.g.,git log -g master)
- Important properties / limitations
- reflog is strictly local
- not shared; differs from other clones
- freshly cloned repo starts with empty reflog (no local activity yet)
- retention is limited (typically a few months)
- time lookups only work while data remains in reflog
- reflog is strictly local
- Mental model
- reflog ≈ “shell history” for Git refs (personal/session-local)
- PowerShell gotcha: escaping braces
{ }git show HEAD@{0}(won’t work)git show HEAD@{0}(OK)git show "HEAD@{0}"(OK)
- Ancestry References
- Caret
^(parent selection)ref^= parent ofref- example:
HEAD^= parent of HEAD
- example:
- Windows cmd.exe gotcha: escaping
^git show "HEAD^"orgit show HEAD^^
- Selecting merge parents
ref^2= second parent (merge commits only)- first parent: branch you were on when merging (often
master) - second parent: branch being merged in (topic branch)
- first parent: branch you were on when merging (often
- Tilde
~(first-parent traversal)ref~≡ref^(first parent)ref~2= first-parent-of-first-parent (grandparent)- repeated tildes:
HEAD~~~≡HEAD~3
- Combining ancestry operators
- example:
HEAD~3^2= second parent of the commit found viaHEAD~3(if that commit is a merge)
- example:
- Caret
Commit Ranges
- Motivation / questions answered
- “What work is on this branch that hasn’t been merged into main?”
- “What am I about to push?”
- “What’s unique between two lines of development?”
Double Dot (A..B)
- Meaning
- commits reachable from
Bbut not reachable fromA
- commits reachable from
- Example uses
- “what’s in experiment not in master?”
git log master..experiment
- opposite direction (what’s in master not in experiment)
git log experiment..master
- “what am I about to push?”
git log origin/master..HEAD
- “what’s in experiment not in master?”
- Omitted side defaults to
HEADgit log origin/master..≡git log origin/master..HEAD
Multiple Points (^ / --not)
- Double-dot is shorthand for a common two-point case
- Equivalent forms
git log refA..refBgit log ^refA refBgit log refB --not refA
- Advantage: can exclude multiple refs
- “reachable from refA or refB, but not from refC”
git log refA refB ^refCgit log refA refB --not refC
- “reachable from refA or refB, but not from refC”
Triple Dot (A...B)
- Meaning (symmetric difference)
- commits reachable from either
AorBbut not both
- commits reachable from either
- Example
git log master...experiment
- Often paired with
--left-rightgit log --left-right master...experiment- marks which side each commit is from (
<vs>)
Interactive Staging
- Goal
- craft commits that contain only certain combinations/parts of changes
- split large messy changes into focused, reviewable commits
Interactive add mode
- Start
git add -i/git add --interactive
- What it shows
- staged vs unstaged changes per path (like
git status, but compact)
- staged vs unstaged changes per path (like
- Core commands menu (as shown)
sstatusuupdate (stage files)rrevert (unstage files)aadd untrackedppatch (stage hunks)ddiff (review staged diff)qquithhelp
Staging and unstaging files (interactive)
- Stage files
u/update- select by numbers (comma-separated)
*indicates selected items- press Enter with nothing selected to stage all selected
- Unstage files
r/revert- select paths to remove from index
- Review staged diff
d/diff- select file(s) to see
- comparable to
git diff --cached
Staging patches (partial-file staging)
- Enter patch selection
- from interactive prompt:
p/patch - from command line:
git add -p/git add --patch
- from interactive prompt:
- Git presents hunks and asks whether to stage each
- Hunk prompt options (as listed)
ystage this hunkndo not stage this hunkastage this and all remaining hunks in fileddo not stage this hunk nor any remaining hunks in filegselect a hunk to go to/search for a hunk matching a regexjleave this hunk undecided, go to next undecided hunkJleave this hunk undecided, go to next hunkkleave this hunk undecided, go to previous undecided hunkKleave this hunk undecided, go to previous hunkssplit current hunk into smaller hunksemanually edit the current hunk?help
- Result
- a file can be partially staged (some staged, some unstaged)
- exit and
git commitwill commit staged parts only
- Patch mode appears in other commands too
git reset --patch(partial unstage/reset)git checkout --patch(partial checkout/revert)git stash save --patch(stash parts; mentioned as further detail later)
Stashing and Cleaning
Stash: why and what it does
- Problem
- need to switch branches while work is half-done
- don’t want to commit unfinished work
git stashsaves:- modified tracked files (working directory)
- staged changes (index)
- Stores changes on a stack; can reapply later (even on different branch)
- Note: migration to
git stash pushgit stash savediscussed as being deprecated in favor ofgit stash push- key reason:
pushsupports stashing selected pathspecs
Stashing your work (basic flow)
- Observe dirty state
git statusshows staged + unstaged changes
- Create stash
git stashorgit stash push- working directory becomes clean
- List stashes
git stash list(e.g.,stash@{0},stash@{1}, …)
- Apply stash
- most recent:
git stash apply - specific:
git stash apply stash@{2} - can apply on different branch
- conflicts possible if changes don’t apply cleanly
- most recent:
- Restore staged state too
git stash apply --index
- Remove stashes
- drop by name:
git stash drop stash@{0} - apply + drop:
git stash pop
- drop by name:
Creative stashing (useful options)
- Keep staged changes in index
git stash --keep-index- stashes everything else, but leaves index intact
- Include untracked files
git stash -u/git stash --include-untracked
- Include ignored files too
git stash --all/git stash -a
- Patch stashing (stash some hunks, keep others)
git stash --patch- interactive hunk selection (prompt options include
y/n/q/a/d//e/?per stash prompt)
Create a branch from a stash
- Use case
- stash is old; applying on current branch causes conflicts
- Command
git stash branch <new-branchname>
- Behavior
- creates a new branch at the commit you were on when stashing
- checks it out
- reapplies stash there
- drops stash if it applies successfully
Cleaning your working directory (git clean)
- Purpose
- remove untracked files/dirs (“cruft”)
- remove build artifacts for clean build
- Caution
- removes files not tracked by Git
- often no way to recover
- safer alternative when unsure:
git stash --all
- Common usage
- preview only:
git clean -n/git clean --dry-run - remove untracked files + empty dirs:
git clean -f -d-frequired unlessclean.requireForce=false
- preview only:
- Ignored files
- default: ignored files are NOT removed
- remove ignored too:
git clean -x
- Interactive cleaning
git clean -x -i- interactive commands shown:
- clean
- filter by pattern
- select by numbers
- ask each
- quit
- help
- Quirk (nested Git repos)
- directories containing other Git repos may require extra force
- may need a second
-f(e.g.,git clean -ffd)
Signing Your Work (GPG)
- Git is cryptographically secure (hashing), but not foolproof for trust
- When consuming work from others, signing helps verify authorship/integrity
GPG setup
- List keys:
gpg --list-keys - Generate key:
gpg --gen-key - Configure Git signing key
git config --global user.signingkey <KEYID>
Signing tags
- Create signed tag
git tag -s <tag> -m '<message>'(instead of-a)
- View signature
git show <tag>
- Passphrase may be required to unlock key
Verifying tags
- Verify signed tag
git tag -v <tag-name>
- Requires signer’s public key in your keyring
- otherwise: “public key not found” / cannot verify
Signing commits
- Sign a commit (Git v1.7.9+)
git commit -S ...
- View/check signatures
git log --show-signature -1- signature status in custom format:
git log --pretty="format:%h %G? %aN %s"- example statuses shown in chapter:
G= good/valid signatureN= no signature
- example statuses shown in chapter:
Enforcing signatures in merges/pulls (Git v1.8.3+)
- Verify signatures during merge/pull
git merge --verify-signatures <branch>- merge fails if commits are unsigned/untrusted
- Verify + sign resulting merge commit
git merge --verify-signatures -S <branch>
Workflow consideration: everyone must sign
- If you require signing:
- ensure all contributors know how to do it
- otherwise you’ll spend time helping rewrite commits to signed versions
- Understand GPG + benefits before adopting as standard workflow
Searching
git grep (search code)
- Search targets
- working directory (default)
- committed trees
- index (staging area)
- Useful options
- line numbers:
-n/--line-number - per-file match counts:
-c/--count - show enclosing function:
-p/--show-function
- line numbers:
- Complex queries
- combine expressions on same line with
--and - multiple
-e <pattern>expressions - can search historical trees (example in chapter uses tag
v1.8.0) - output readability helpers:
--break,--heading
- combine expressions on same line with
- Advantages vs external tools (grep/ack)
- very fast
- can search any Git tree, not just current checkout
git log searching (by content)
- Find when a string was introduced/changed (diff-based search)
- Pickaxe (
-S)git log -S <string>- shows commits that changed number of occurrences of the string
- Regex diff search (
-G)git log -G <regex>
Line history search (git log -L)
- Show history of a function/line range as patches
- Function syntax
git log -L :<function_name>:<file>
- Regex/range alternatives if function parsing fails
- regex + end pattern:
git log -L '/<regex>/',/^}/:<file> - explicit line ranges or a single line number also supported (noted)
- regex + end pattern:
Rewriting History
Why rewrite history (locally)
- Make history reflect logical, reviewable changes
- reorder commits
- rewrite messages
- modify commit contents
- squash/split commits
- remove commits entirely
- Cardinal rule
- don’t push until you’re happy
- rewriting pushed history confuses collaborators (treat pushed as final unless strong reason)
Changing the last commit
- Amend message and/or content
git commit --amend
- Common patterns
- fix message only: amend, edit message in editor
- fix content:
- edit files → stage changes →
git commit --amend
- edit files → stage changes →
- Caution
- amending changes SHA-1 (like small rebase)
- don’t amend a commit that’s already pushed
- Tip: avoid editor if message unchanged
git commit --amend --no-edit
- Note: commit message may need updating if content changes substantially
Changing multiple commit messages (interactive rebase)
- Tool: interactive rebase
git rebase -i <upstream>
- Choosing the range
- specify the parent of the oldest commit you want to edit
- example for last 3 commits:
git rebase -i HEAD~3
- Warning
- rewrites every commit in selected range and descendants
- avoid rewriting commits already pushed
- Interactive todo list properties
- commits listed oldest→newest (reverse of typical
git logoutput) - Git replays commits top→bottom
- commits listed oldest→newest (reverse of typical
- Todo commands shown
pickuse commitreworduse commit, edit messageeditstop for amendingsquashmeld into previous, edit combined messagefixuplike squash, discard this commit messageexecrun shell commandbreakstop here, continue later withgit rebase --continuedropremove commitlabellabel current HEADresetreset HEAD to a labelmergecreate merge commit (with options to keep/reword message)- notes shown in template:
- lines can be re-ordered
- removing a line loses that commit
- removing everything aborts rebase
- empty commits commented out
Reordering commits (interactive rebase)
- Reorder lines in todo file
- Save + exit
- Git rewinds branch to parent of the todo range
- replays commits in new order
Removing commits (interactive rebase)
- Delete the line or mark it
drop - Effects
- rewriting a commit rewrites all following commits’ SHA-1s
- can cause conflicts if later commits depend on removed one
Squashing commits
- Mark subsequent commits as
squash(orfixup) - Git:
- applies changes together
- opens editor to combine messages (except fixup discards message)
- Outcome
- a single commit replacing multiple commits
Splitting a commit
- Mark target commit as
editin rebase todo - When rebase stops at that commit
- undo that commit while keeping changes in working tree/index state
git reset HEAD^(mixed reset)
- stage and commit portions into multiple commits
- continue rebase
git rebase --continue
- undo that commit while keeping changes in working tree/index state
- Reminder
- rewriting changes SHA-1s of affected commit and subsequent commits
- avoid if any are pushed
Aborting or recovering
- Abort in-progress rebase
git rebase --abort
- After completing, recover earlier state
- use reflog (chapter references this as Data Recovery elsewhere)
The nuclear option: filter-branch
- Purpose
- scriptable rewriting across many commits
- examples:
- remove file from every commit
- change email globally
- rewrite project root from subdirectory
- Warning callout
git filter-branchhas many pitfalls; no longer recommended- prefer
git-filter-repo(Python) for most use cases
- Common uses shown
- Remove a file from every commit (e.g., secrets/huge binaries)
git filter-branch --tree-filter 'rm -f passwords.txt' HEAD--tree-filterruns command after each checkout; recommits results- can use patterns (e.g.,
rm -f *~) - to run across all branches:
--all - recommended: test in a branch, then hard-reset master if satisfied
- Make a subdirectory the new root
git filter-branch --subdirectory-filter trunk HEAD- auto-removes commits that didn’t affect the subdirectory
- Change email addresses globally (only yours)
git filter-branch --commit-filter '<script>' HEAD- script checks
GIT_AUTHOR_EMAIL, rewrites author name/email, callsgit commit-tree - note: parent SHA-1 changes propagate, rewriting entire history chain
- Remove a file from every commit (e.g., secrets/huge binaries)
Reset Demystified (reset & checkout mental model)
The Three Trees (collections of files)
- HEAD
- last commit snapshot; next parent
- pointer to current branch ref → last commit on that branch
- inspect snapshot (plumbing examples shown)
git cat-file -p HEADgit ls-tree -r HEAD
- Index (staging area)
- proposed next commit snapshot (what
git commituses) - inspect index (plumbing)
git ls-files -s
- note: implemented as flattened manifest (not a literal tree), but treated as “tree” conceptually
- proposed next commit snapshot (what
- Working Directory
- sandbox with real files (editable)
- unpacked from
.gitstorage into filesystem
Typical workflow across the three trees
- After
git init- only working directory has content
git add- copies content working directory → index
git commit- writes index snapshot → commit
- moves current branch pointer (HEAD’s branch)
- Clean state
- HEAD == index == working directory
- Modify file
- working directory differs from index → “Changes not staged”
- Stage file
- index differs from HEAD → “Changes to be committed”
- Checkout behavior summary (mentioned)
git checkout <branch>:- moves HEAD to that branch
- fills index with commit snapshot
- copies index → working directory
The role of reset (commit-level)
- Reset manipulates the three trees in order (up to 3 operations)
- Move the branch HEAD points to (REF move)
- Update index to match new HEAD (
--mixed) - Update working directory to match index (
--hard)
- Step 1: move HEAD’s branch ref
- always attempted when reset is given a commit
--softstops here- resembles undoing the last
git commit(ref moves back)
- Step 2: update index (
--mixed, default)- index becomes snapshot of new HEAD
--mixedstops here- resembles undoing
git add+git commit
- Step 3: update working directory (
--hard)- working directory overwritten to match index
- this is the dangerous form
- can destroy uncommitted work
- other forms are generally recoverable (e.g., via reflog)
Reset with a path (file-level)
- Behavior change
- skips step 1 (can’t move a ref “partially”)
- applies index/working-dir updates only for specified paths
- Common unstage use
git reset file.txt- shorthand for
git reset --mixed HEAD file.txt - copies file from HEAD → index (unstages)
- conceptual opposite of
git add file.txt
- shorthand for
- Reset a path to a specific commit’s version (index only)
git reset <commit> file.txt- can prepare a commit that reverts a file without checking out old version into working dir
- Patch mode
git reset --patchallows selective unstaging/resetting hunks
Squashing commits with reset
- Alternative to interactive rebase for simple cases
- Example flow
git reset --soft HEAD~2git commit(creates one commit combining last two commits’ changes)
Checkout vs Reset
- Both manipulate the three trees; differences depend on “with paths” or not
Without paths
git checkout <branch>- similar outcome to
git reset --hard <branch>(trees match target) - key differences
- working-directory safe
- checks + trivial merges; avoids overwriting local changes where possible
- moves HEAD itself to point to another branch
- working-directory safe
- similar outcome to
git reset <branch>- moves the branch ref HEAD points to (REF move), not HEAD
With paths
git checkout [commit] <paths>- does not move HEAD
- updates index and working directory for those paths
- not working-directory safe (can overwrite local changes)
- supports
--patchfor hunk-by-hunk revert
Cheat sheet (which trees each command affects)
- Commit level
reset --soft [commit]- HEAD column: REF moves; Index: no; Workdir: no; WD safe: yes
reset [commit](default mixed)- REF moves; Index: yes; Workdir: no; WD safe: yes
reset --hard [commit]- REF moves; Index: yes; Workdir: yes; WD safe: NO
checkout <commit>- HEAD moves; Index: yes; Workdir: yes; WD safe: yes
- File level
reset [commit] <paths>- HEAD: no; Index: yes; Workdir: no; WD safe: yes
checkout [commit] <paths>- HEAD: no; Index: yes; Workdir: yes; WD safe: NO
Advanced Merging
Git merge philosophy and practical guidance
- Git often makes merging easy, enabling long-lived branches with frequent merges
- resolve small conflicts often instead of huge conflicts later
- Git avoids “overly clever” auto-resolution
- if ambiguous, it stops and asks you to resolve
- Best practice before merges that might conflict
- start with a clean working directory
- otherwise commit to temp branch or stash
Merge conflicts: tools and strategies
Aborting a merge
- If you don’t want to deal with conflicts yet
git merge --abort- returns to pre-merge state (unless WIP changes complicate)
- “Start over” option (dangerous)
git reset --hard HEAD(loses uncommitted work)
Ignoring whitespace during merge
- If conflicts are largely whitespace-related
- re-run merge with strategy options
git merge -Xignore-all-space <branch>git merge -Xignore-space-change <branch>
- re-run merge with strategy options
- Practical benefit
- resolves merges where only formatting/line endings differed
Manual file re-merging (scriptable fixes)
- Use case
- Git can’t auto-handle some transformations (e.g., normalize line endings)
- Concept
- extract three versions of the conflicted file from index stages
- stage 1: base/common ancestor
- stage 2: ours
- stage 3: theirs (MERGE_HEAD)
- extract three versions of the conflicted file from index stages
- Extract versions
git show :1:<file> > <file>.commongit show :2:<file> > <file>.oursgit show :3:<file> > <file>.theirs
- Inspect blob SHAs in index
git ls-files -u
- Preprocess + merge single file
- preprocess one side (example shown:
dos2unixon theirs) - merge with
git merge-file -p ours common theirs > <file>
- preprocess one side (example shown:
- Compare result vs each side (helpful review)
git diff --oursgit diff --theirs -b(strip whitespace for Git-stored version comparisons)git diff --base -b
- Cleanup temp artifacts
git clean -f
Checking out conflicts / marker styles / choosing sides
- Re-checkout file with conflict markers
git checkout --conflict=merge <file>(default style)git checkout --conflict=diff3 <file>(adds inline base section)
- Make diff3 default
git config --global merge.conflictstyle diff3
- Quickly choose one side for a file
git checkout --ours <file>git checkout --theirs <file>- useful for binary files or “take one side” decisions
Merge log (find what contributed to conflicts)
- Show unique commits from both sides of merge
git log --oneline --left-right HEAD...MERGE_HEAD
- Show only commits that touch currently conflicted file(s)
git log --oneline --left-right --merge- add
-pto view diffs of the conflicted file(s)
Combined diff format
- During unresolved merge conflicts
git diffshows “combined diff” (diff --cc)- two columns indicate differences vs ours and vs theirs
- After resolving conflict
- combined diff highlights:
- what was removed from ours
- what was removed from theirs
- what resolution introduced
- combined diff highlights:
- Review after the fact
git show <merge_commit>shows combined diff for mergegit log --cc -pincludes combined diffs in log output
Undoing merges
- Scenario: accidental merge commit
Option 1: fix references (rewrite history)
- If unwanted merge exists only locally
git reset --hard HEAD~
- Downside
- rewrites history (problematic if others have the commits)
- won’t work safely if other commits happened after the merge (would lose them)
Option 2: reverse the merge commit (revert)
- Create a new commit that undoes changes introduced by merge
git revert -m 1 HEAD
-m 1(mainline parent selection)- keep parent #1 (current branch’s line)
- undo parent #2’s introduced changes
- Important consequence
- history still contains original merged commits
- merging that branch again may say “Already up-to-date”
- later merges may only bring changes since reverted merge
- Fix when you actually want to re-merge later
- “un-revert” the revert commit (
git revert ^Mas shown conceptually) - then merge again to bring full changes
- “un-revert” the revert commit (
Other types of merges
“Ours” / “Theirs” preference (recursive strategy option)
- Use when conflicts should default to one side
git merge -Xours <branch>git merge -Xtheirs <branch>
- Behavior
- still merges non-conflicting changes normally
- for conflicts, chooses the specified side entirely (including binaries)
- Similar capability at file-merge level
git merge-file --ours ...(noted)
“ours” merge strategy (-s ours) (fake merge)
- Different from
-Xours - Command
git merge -s ours <branch>
- Behavior
- records merge commit with both parents
- result tree equals current branch (ignores merged-in branch content)
- Use case
- mark work as merged to avoid conflicts later (e.g., backport workflows)
Subtree merging
- Problem solved
- one project is a subdirectory of another
- Example workflow (shown)
- add other project as remote; fetch
- checkout remote branch into local branch (e.g.,
rack_branch) - import that branch into subdirectory of main project
git read-tree --prefix=<dir>/ -u <branch>
- merge upstream changes back into main project subtree
git merge --squash -s recursive -Xsubtree=<dir> <branch>
- Notes / tradeoffs (explicitly discussed)
- avoids submodules; all code in one repo
- more complex; easier to make reintegration mistakes; risk of pushing unrelated branches
- Diffing subtree vs a branch
- use
git diff-tree -p <branch>(not plaingit diff)
- use
Rerere (reuse recorded resolution)
- Meaning: “reuse recorded resolution”
- Value
- remembers how you resolved a conflict hunk
- next time the same conflict appears, resolves automatically
- Useful scenarios cited
- long-lived topic branches: repeated merges without keeping intermediate merge commits
- frequent rebases: avoid re-resolving same conflicts repeatedly
- test-merge many evolving branches: redo merges without re-resolving
- Enable rerere
git config --global rerere.enabled true- (alternative: create
.git/rr-cachedirectory per repo)
- During a conflict with rerere enabled
- message appears:
Recorded preimage for '<file>'
- message appears:
- Inspect rerere data
git rerere status(files recorded)git rerere diff(preimage vs resolved state)
- After resolving and committing
- message:
Recorded resolution for '<file>'.
- message:
- Reuse in later conflict (merge/rebase)
- message:
Resolved '<file>' using previous resolution. - file may already be clean (markers removed)
- can recreate conflict markers for inspection
git checkout --conflict=merge <file>
- can reapply cached resolution explicitly
git rerere
- message:
Debugging with Git
File annotation (git blame)
- When you know “where” the bug is, but not “when” it appeared
- Command
git blame <file>- restrict range with
-L <start>,<end>
- Output fields explained
- short SHA-1 of commit that last modified each line
- author name + authored date
- line number + line content
- Special
^prefix in blame output- indicates line originated in initial commit and never changed
- Track code movement/copies
git blame -Ctries to find where code was copied from- can show original file/commit for copied snippets (not just when copied)
Binary search for bug introduction (git bisect)
- Purpose
- find the first bad commit via binary search
- Basic workflow
- start:
git bisect start - mark current as bad:
git bisect bad - mark last known good:
git bisect good <good_commit>(example usesv1.0) - Git checks out midpoint; you test; mark
goodorbad - repeat until Git identifies first bad commit
- start:
- Output when finished
- indicates first bad commit SHA-1 + commit info + changed paths
- Clean up
git bisect reset(return to original HEAD)
- Automation
- specify range directly:
git bisect start <bad> <good> - run script that returns 0 for good, non-0 for bad
git bisect run <test-script>
- specify range directly:
Submodules
Motivation and concept
- Need to use another project inside yours while keeping it separate
- Tradeoffs of alternatives
- shared library install: hard to customize; deployment complexity
- copying source: hard to merge upstream changes
- Submodules
- allow a Git repository as a subdirectory of another
- superproject records a specific subproject commit
Starting with submodules
- Add submodule
git submodule add <url> [path]- default path = repo name
- What changes in superproject
.gitmodulesfile created (version-controlled)- maps
submodule.<name>.pathandsubmodule.<name>.url
- maps
- submodule directory entry staged as a special Git mode
- mode
160000(records a commit as a directory entry)
- mode
- diff behavior
git diff --cached <submodule>shows “Subproject commit ”- nicer:
git diff --cached --submodule
- Commit and push as normal (superproject now pins a submodule commit)
- URL accessibility note
.gitmodulesURL is what others use to clone/fetch- choose a URL others can access
- you can override locally via
git config submodule.<name>.url <PRIVATE_URL> - relative URLs can help in some setups
Cloning a repo with submodules
- Default clone behavior
- submodule directories exist but are empty (no files)
- Initialize and update
git submodule initgit submodule update
- One-step clone
git clone --recurse-submodules <url>
- If you already cloned
- combine init+update:
git submodule update --init - include nested submodules:
git submodule update --init --recursive
- combine init+update:
Working with submodules
Pulling upstream changes from submodule remote (consumer model)
- Manual inside submodule
git fetchgit merge origin/<branch>
- Show submodule changes from superproject
git diff --submodule- set default diff format:
git config --global diff.submodule log
- Auto-update from superproject
git submodule update --remote [<submodule>]- default branch tracked: submodule’s
masterunless configured otherwise
- Track a different branch (e.g., stable)
- store for everyone: edit
.gitmodulesgit config -f .gitmodules submodule.<name>.branch stable
- then
git submodule update --remote
- store for everyone: edit
- Status improvements
git statusshows submodule “modified (new commits)”git config status.submodulesummary 1shows brief summary
Pulling upstream changes from superproject remote (collaborator model)
git pull- fetches superproject commits
- also fetches submodule objects (as shown)
- but does NOT update submodule working directories by default
- Symptoms
git statusshows submodule modified with “new commits”- arrows in summary may indicate expected commits not checked out locally
- Fix
git submodule update --init --recursive
- Automate
git pull --recurse-submodules(Git ≥ 2.14)- default recursion for supported commands:
git config submodule.recurse true(pull recursion since Git 2.15)
- Special case: upstream changed submodule URL in
.gitmodules- remedy:
git submodule sync --recursivegit submodule update --init --recursive
- remedy:
Working on a submodule (active development)
- Detached HEAD default issue
git submodule updateoften leaves submodule in detached HEAD- local commits risk being “orphaned” by future updates
- Make it hackable
- enter submodule and checkout a branch
git checkout <branch>(e.g.,stable)
- enter submodule and checkout a branch
- Updating while you have local work
- merge upstream into your local branch
git submodule update --remote --merge
- or rebase local changes
git submodule update --remote --rebase
- if you forget
--merge/--rebase- Git updates submodule checkout and may leave you detached again
- merge upstream into your local branch
- Safety behaviors
- if local changes would be overwritten: update aborts and tells you to commit/stash
- conflicts during
--mergeupdate are resolved inside the submodule like normal merges
Publishing submodule changes
- Problem
- pushing superproject that references submodule commits not available on any remote breaks others
- Push options
- check mode (fail if submodules not pushed)
git push --recurse-submodules=check- default config:
git config push.recurseSubmodules check
- on-demand mode (push submodules first automatically)
git push --recurse-submodules=on-demand- default config:
git config push.recurseSubmodules on-demand
- check mode (fail if submodules not pushed)
Merging submodule changes (superproject conflicts)
- Fast-forward case
- if one submodule commit is ancestor of the other, Git chooses the newer (works)
- Divergent case
- Git does not trivial-merge submodule histories for you
- conflict example shown:
CONFLICT (submodule): Merge conflict in <submodule>
- Diagnose SHAs
git diffon the superproject shows both submodule commit IDs
- Manual resolution flow (shown)
- enter submodule
- create a branch pointing to the other side’s SHA (e.g.,
try-merge) - merge it, resolve conflicts, commit in submodule
- return to superproject
git add <submodule>to record resolved submodule pointer- commit superproject merge
- Alternative case: Git suggests an existing submodule merge commit
- it may print a “possible merge resolution” SHA and a suggested
git update-index --cacheinfo 160000 <sha> <path> - recommended approach still: verify in submodule, fast-forward/merge, then
git add+ commit
- it may print a “possible merge resolution” SHA and a suggested
Submodule tips
- Run commands in each submodule
git submodule foreach '<cmd>'- examples shown
- stash across all:
git submodule foreach 'git stash' - create branch across all:
git submodule foreach 'git checkout -b <branch>' - unified diffs:
git diff; git submodule foreach 'git diff'
- stash across all:
- Useful aliases (examples)
sdiff= diff superproject + each submodule diffspush= push with--recurse-submodules=on-demandsupdate=submodule update --remote --merge
Issues with submodules
- Switching branches (older Git < 2.13)
- switching to branch without submodule leaves submodule directory as untracked
- cleanup needed:
git clean -ffdx - switching back requires
git submodule update --init
- Newer Git (≥ 2.13)
git checkout --recurse-submodules <branch>keeps submodules consistent when switching- you can default recursion:
git config submodule.recurse true
- Switching from subdirectories to submodules
- if a directory is already tracked,
git submodule addfails (“already exists in the index”) - fix:
git rm -r <dir>first, thengit submodule add ... - switching back to branch where files are tracked (not submodule) can fail due to untracked files overwrite risk
- can force with
git checkout -f(danger: overwrites unsaved changes)
- can force with
- may end with empty submodule directory; may need inside submodule:
git checkout .
- if a directory is already tracked,
- Storage note (modern Git)
- submodule Git data stored in superproject’s
.gitdirectory - deleting submodule working directory won’t lose commits/branches
- submodule Git data stored in superproject’s
Bundling (git bundle)
- Purpose
- transfer Git data without network protocols (HTTP/SSH)
- Use cases
- no network
- offsite/security constraints
- broken networking hardware
- email/USB transfer
- Create bundle
git bundle create <file.bundle> <ref_or_range>...- must list each reference/range to include
- to be cloneable, include
HEADplus branch (example:HEAD master)
- Clone from bundle
git clone <bundle> <dir>- if
HEADnot included, may need-b <branch>to choose checkout branch
- Incremental bundles (send only new commits)
- you must compute the range manually (unlike network push)
- range examples used
origin/master..mastermaster ^origin/master
- create incremental bundle example pattern
git bundle create <bundle> master ^<known_base_commit>
- Inspect / validate bundles
- verify bundle and prerequisites
git bundle verify <bundle>
- list heads
git bundle list-heads <bundle>
- verify bundle and prerequisites
- Import
- fetch from bundle to a local branch
git fetch <bundle> <bundleBranch>:<localBranch>
- inspect graph with
git log --graph --all
- fetch from bundle to a local branch
Replace (git replace)
- Core idea
- Git objects are immutable, but
replacelets Git pretend object A is object B - “when you refer to X, use Y instead”
- Git objects are immutable, but
- Common use
- replace a commit without rewriting entire history (vs filter-branch)
- graft histories together (short “recent” history + longer “historical” history)
Example: grafting history without rewriting all SHA-1s
- Split repository into:
- historical repo (commits 1→4)
- truncated recent repo (commits 4→5 + an “instructions” base commit)
- Tools used (as shown)
- create historical branch and push to another remote
- truncate recent history by creating a parentless base commit with plumbing
git commit-tree <commit>^{tree}(creates new commit from a tree)
- rebase onto that base commit
git rebase --onto <newBase> <splitPointCommit>
- Recombine in a clone
- fetch both remotes
git replace <recent_fourth_sha> <historical_fourth_sha>
- Effects/notes
git logshows full history- SHA displayed remains the original (the one being “replaced”), but content comes from replacement
git cat-file -p <old>shows replaced data (including different parent)- replacement stored as a ref
refs/replace/<oldsha>
- can share by pushing that ref
Credential Storage
Problem being solved
- SSH can use keys (possibly no passphrase) → no repeated prompts
- HTTP always needs username/password
- 2FA tokens make passwords harder to type/manage
Built-in credential helper approaches
- No caching (default)
- prompts every connection
cache- stores credentials in memory
- not written to disk
- purges after timeout (default 15 minutes / 900s)
store- writes credentials to plain-text file (default
~/.git-credentials) - never expires
- downside: cleartext password on disk
- writes credentials to plain-text file (default
- macOS keychain helper (
osxkeychain)- stores encrypted in system keychain; persists
- Git Credential Manager (Windows/macOS/Linux)
- uses platform-native secure stores
Configuration
- Set helper
git config --global credential.helper <helper>
- Helper options
- store file location
git config --global credential.helper 'store --file <path>'
- cache timeout
git config --global credential.helper 'cache --timeout <seconds>'
- store file location
- Multiple helpers
- Git queries helpers in order until one returns credentials
- When saving, Git sends creds to all helpers (each decides what to do)
- Example
.gitconfigpattern shown- thumbdrive store + memory cache fallback
Under the hood (git credential)
- Git’s root credential command
git credential <action>- communicates via stdin/stdout key-value protocol
- Example action explained:
fill- Git provides what it knows (e.g., protocol, host)
- blank line ends input
- credential system outputs what it found (username/password)
- if unknown, Git prompts user and outputs what user entered
- How helpers are invoked (forms)
foo→ runsgit-credential-foofoo -a --opt=bcd→ runsgit-credential-foo -a --opt=bcd/absolute/path/foo -xyz→ runs that program!<shell>→ executes shell code
- Helper action set (slightly different terms)
getrequest credentialsstoresave credentialseraseremove credentials
- Output rules
- for
get: helper may output additional key=value lines (overriding existing) - for
store/erase: output ignored
- for
git-credential-storeexample shown- store:
git credential-store --file <file> store - get:
git credential-store --file <file> get - file format: credential-decorated URL per line
https://user:pass@host
- store:
Custom credential helper example: read-only shared store
- Use case described
- team-shared credentials in shared directory
- don’t want to copy to personal credential store
- credentials change often
- Requirements (as listed)
- only handle
get; ignore store/erase - read
git-credential-store-compatible file format - allow configurable path (
--file)
- only handle
- Implementation outline shown (Ruby)
- parse options
- exit unless action is
getand file exists - read stdin key=value pairs until blank line
- scan credential file; match on protocol/host/username
- output protocol/host/username/password if found
- Configure with helper short name
git config --global credential.helper 'read-only --file <shared_path>'
Chapter Summary
- You now have advanced tools to:
- select commits and ranges precisely
- stage/commit partial changes interactively
- temporarily shelve work (stash) and safely remove untracked artifacts (clean)
- sign and verify tags/commits with GPG, and optionally enforce signed merges
- search code and history efficiently (
grep, log pickaxe/regex, line history) - rewrite local history confidently (amend, interactive rebase; filter-branch caveats)
- understand
resetandcheckoutvia the three-tree model - handle complex merges (whitespace strategies, manual merges, combined diffs, undo merges, subtree merges, rerere)
- debug regressions (
blame,bisect, automated bisect runs) - manage nested dependencies with submodules (setup, update, push safety, conflicts, tips, caveats)
- transfer Git data offline (bundles)
- “graft” history with virtual object replacement (
replace) - manage credentials with helpers (including writing your own)