Files
mapas-mentales/mindmap/Git Tools.md

42 KiB
Raw Blame History

Git Tools

Purpose / Context

  • You already know day-to-day Git workflows
    • track + commit files
    • staging area
    • topic branching + merging
  • This chapter: powerful/advanced tools you might not use every day, but will eventually need

Revision Selection

  • Git can refer to:
    • a single commit
    • a set of commits
    • a range of commits
  • References can be:
    • hashes (full/short)
    • branch names
    • reflog entries
    • ancestry expressions
    • range expressions

Single Revisions

  • Full SHA-1
    • 40-character commit hash (e.g., from git log)
  • Short SHA-1 (abbreviated hash)
    • Git accepts a prefix of the SHA-1 if:
      • at least 4 characters
      • unambiguous among all objects in the object database
    • Inspect a commit (examples; any unique prefix works)
      • git show <full_sha>
      • git show <shorter_unique_prefix>
    • Generate abbreviated commits in log output
      • git log --abbrev-commit --pretty=oneline
      • defaults to 7 characters; lengthens as needed to remain unique
    • Practical uniqueness
      • often 810 chars enough within a repo
      • example note: very large repos still have unique prefixes (Linux kernel cited)
  • Note: SHA-1 collision concerns (and Gits direction)
    • SHA-1 digest: 20 bytes / 160 bits
    • Random collisions are astronomically unlikely
      • 50% collision probability requires about 2^80 randomly-hashed objects
      • probability formula cited: p = (n(n-1)/2) * (1/2^160)
    • If a collision happened organically:
      • Git would reuse the first object with that hash (youd always get first objects data)
    • Deliberate, synthesized collisions are possible (e.g., shattered.io, Feb 2017)
    • Git is moving toward SHA-256 as the default hash algorithm
      • more resilient to collision attacks
      • mitigation code exists, but cannot fully eliminate attacks
  • Branch References
    • If a commit is the tip of a branch, you can refer to it by branch name
      • git show <branch>
      • equivalent to git show <sha_of_branch_tip>
    • Plumbing tool to resolve refs → SHA-1: git rev-parse
      • example: git rev-parse topic1
      • purpose: lower-level operations (not typical day-to-day), but useful for “what is this ref really?”
  • Reflog Shortnames
    • Git records a reflog (local history of where HEAD/refs have pointed)
    • View reflog
      • git reflog
      • shows entries like HEAD@{0}, HEAD@{1}, …
    • Refer to older values
      • git show HEAD@{5} (the 5th prior HEAD value in reflog)
    • Time-based reflog syntax
      • git show master@{yesterday}
    • Log-format reflog output
      • git log -g <branch> (e.g., git log -g master)
    • Important properties / limitations
      • reflog is strictly local
        • not shared; differs from other clones
        • freshly cloned repo starts with empty reflog (no local activity yet)
      • retention is limited (typically a few months)
      • time lookups only work while data remains in reflog
    • Mental model
      • reflog ≈ “shell history” for Git refs (personal/session-local)
    • PowerShell gotcha: escaping braces { }
      • git show HEAD@{0} (wont work)
      • git show HEAD@{0} (OK)
      • git show "HEAD@{0}" (OK)
  • Ancestry References
    • Caret ^ (parent selection)
      • ref^ = parent of ref
        • example: HEAD^ = parent of HEAD
      • Windows cmd.exe gotcha: escaping ^
        • git show "HEAD^" or git show HEAD^^
    • Selecting merge parents
      • ref^2 = second parent (merge commits only)
        • first parent: branch you were on when merging (often master)
        • second parent: branch being merged in (topic branch)
    • Tilde ~ (first-parent traversal)
      • ref~ref^ (first parent)
      • ref~2 = first-parent-of-first-parent (grandparent)
      • repeated tildes: HEAD~~~HEAD~3
    • Combining ancestry operators
      • example: HEAD~3^2 = second parent of the commit found via HEAD~3 (if that commit is a merge)

Commit Ranges

  • Motivation / questions answered
    • “What work is on this branch that hasnt been merged into main?”
    • “What am I about to push?”
    • “Whats unique between two lines of development?”

Double Dot (A..B)

  • Meaning
    • commits reachable from B but not reachable from A
  • Example uses
    • “whats in experiment not in master?”
      • git log master..experiment
    • opposite direction (whats in master not in experiment)
      • git log experiment..master
    • “what am I about to push?”
      • git log origin/master..HEAD
  • Omitted side defaults to HEAD
    • git log origin/master..git log origin/master..HEAD

Multiple Points (^ / --not)

  • Double-dot is shorthand for a common two-point case
  • Equivalent forms
    • git log refA..refB
    • git log ^refA refB
    • git log refB --not refA
  • Advantage: can exclude multiple refs
    • “reachable from refA or refB, but not from refC”
      • git log refA refB ^refC
      • git log refA refB --not refC

Triple Dot (A...B)

  • Meaning (symmetric difference)
    • commits reachable from either A or B but not both
  • Example
    • git log master...experiment
  • Often paired with --left-right
    • git log --left-right master...experiment
    • marks which side each commit is from (< vs >)

Interactive Staging

  • Goal
    • craft commits that contain only certain combinations/parts of changes
    • split large messy changes into focused, reviewable commits

Interactive add mode

  • Start
    • git add -i / git add --interactive
  • What it shows
    • staged vs unstaged changes per path (like git status, but compact)
  • Core commands menu (as shown)
    • s status
    • u update (stage files)
    • r revert (unstage files)
    • a add untracked
    • p patch (stage hunks)
    • d diff (review staged diff)
    • q quit
    • h help

Staging and unstaging files (interactive)

  • Stage files
    • u / update
    • select by numbers (comma-separated)
    • * indicates selected items
    • press Enter with nothing selected to stage all selected
  • Unstage files
    • r / revert
    • select paths to remove from index
  • Review staged diff
    • d / diff
    • select file(s) to see
    • comparable to git diff --cached

Staging patches (partial-file staging)

  • Enter patch selection
    • from interactive prompt: p / patch
    • from command line: git add -p / git add --patch
  • Git presents hunks and asks whether to stage each
  • Hunk prompt options (as listed)
    • y stage this hunk
    • n do not stage this hunk
    • a stage this and all remaining hunks in file
    • d do not stage this hunk nor any remaining hunks in file
    • g select a hunk to go to
    • / search for a hunk matching a regex
    • j leave this hunk undecided, go to next undecided hunk
    • J leave this hunk undecided, go to next hunk
    • k leave this hunk undecided, go to previous undecided hunk
    • K leave this hunk undecided, go to previous hunk
    • s split current hunk into smaller hunks
    • e manually edit the current hunk
    • ? help
  • Result
    • a file can be partially staged (some staged, some unstaged)
    • exit and git commit will commit staged parts only
  • Patch mode appears in other commands too
    • git reset --patch (partial unstage/reset)
    • git checkout --patch (partial checkout/revert)
    • git stash save --patch (stash parts; mentioned as further detail later)

Stashing and Cleaning

Stash: why and what it does

  • Problem
    • need to switch branches while work is half-done
    • dont want to commit unfinished work
  • git stash saves:
    • modified tracked files (working directory)
    • staged changes (index)
  • Stores changes on a stack; can reapply later (even on different branch)
  • Note: migration to git stash push
    • git stash save discussed as being deprecated in favor of git stash push
    • key reason: push supports stashing selected pathspecs

Stashing your work (basic flow)

  • Observe dirty state
    • git status shows staged + unstaged changes
  • Create stash
    • git stash or git stash push
    • working directory becomes clean
  • List stashes
    • git stash list (e.g., stash@{0}, stash@{1}, …)
  • Apply stash
    • most recent: git stash apply
    • specific: git stash apply stash@{2}
    • can apply on different branch
    • conflicts possible if changes dont apply cleanly
  • Restore staged state too
    • git stash apply --index
  • Remove stashes
    • drop by name: git stash drop stash@{0}
    • apply + drop: git stash pop

Creative stashing (useful options)

  • Keep staged changes in index
    • git stash --keep-index
    • stashes everything else, but leaves index intact
  • Include untracked files
    • git stash -u / git stash --include-untracked
  • Include ignored files too
    • git stash --all / git stash -a
  • Patch stashing (stash some hunks, keep others)
    • git stash --patch
    • interactive hunk selection (prompt options include y/n/q/a/d//e/? per stash prompt)

Create a branch from a stash

  • Use case
    • stash is old; applying on current branch causes conflicts
  • Command
    • git stash branch <new-branchname>
  • Behavior
    • creates a new branch at the commit you were on when stashing
    • checks it out
    • reapplies stash there
    • drops stash if it applies successfully

Cleaning your working directory (git clean)

  • Purpose
    • remove untracked files/dirs (“cruft”)
    • remove build artifacts for clean build
  • Caution
    • removes files not tracked by Git
    • often no way to recover
    • safer alternative when unsure: git stash --all
  • Common usage
    • preview only: git clean -n / git clean --dry-run
    • remove untracked files + empty dirs:
      • git clean -f -d
      • -f required unless clean.requireForce=false
  • Ignored files
    • default: ignored files are NOT removed
    • remove ignored too: git clean -x
  • Interactive cleaning
    • git clean -x -i
    • interactive commands shown:
      • clean
      • filter by pattern
      • select by numbers
      • ask each
      • quit
      • help
  • Quirk (nested Git repos)
    • directories containing other Git repos may require extra force
    • may need a second -f (e.g., git clean -ffd)

Signing Your Work (GPG)

  • Git is cryptographically secure (hashing), but not foolproof for trust
  • When consuming work from others, signing helps verify authorship/integrity

GPG setup

  • List keys: gpg --list-keys
  • Generate key: gpg --gen-key
  • Configure Git signing key
    • git config --global user.signingkey <KEYID>

Signing tags

  • Create signed tag
    • git tag -s <tag> -m '<message>' (instead of -a)
  • View signature
    • git show <tag>
  • Passphrase may be required to unlock key

Verifying tags

  • Verify signed tag
    • git tag -v <tag-name>
  • Requires signers public key in your keyring
    • otherwise: “public key not found” / cannot verify

Signing commits

  • Sign a commit (Git v1.7.9+)
    • git commit -S ...
  • View/check signatures
    • git log --show-signature -1
    • signature status in custom format: git log --pretty="format:%h %G? %aN %s"
      • example statuses shown in chapter:
        • G = good/valid signature
        • N = no signature

Enforcing signatures in merges/pulls (Git v1.8.3+)

  • Verify signatures during merge/pull
    • git merge --verify-signatures <branch>
    • merge fails if commits are unsigned/untrusted
  • Verify + sign resulting merge commit
    • git merge --verify-signatures -S <branch>

Workflow consideration: everyone must sign

  • If you require signing:
    • ensure all contributors know how to do it
    • otherwise youll spend time helping rewrite commits to signed versions
  • Understand GPG + benefits before adopting as standard workflow

Searching

git grep (search code)

  • Search targets
    • working directory (default)
    • committed trees
    • index (staging area)
  • Useful options
    • line numbers: -n / --line-number
    • per-file match counts: -c / --count
    • show enclosing function: -p / --show-function
  • Complex queries
    • combine expressions on same line with --and
    • multiple -e <pattern> expressions
    • can search historical trees (example in chapter uses tag v1.8.0)
    • output readability helpers: --break, --heading
  • Advantages vs external tools (grep/ack)
    • very fast
    • can search any Git tree, not just current checkout

git log searching (by content)

  • Find when a string was introduced/changed (diff-based search)
  • Pickaxe (-S)
    • git log -S <string>
    • shows commits that changed number of occurrences of the string
  • Regex diff search (-G)
    • git log -G <regex>

Line history search (git log -L)

  • Show history of a function/line range as patches
  • Function syntax
    • git log -L :<function_name>:<file>
  • Regex/range alternatives if function parsing fails
    • regex + end pattern: git log -L '/<regex>/',/^}/:<file>
    • explicit line ranges or a single line number also supported (noted)

Rewriting History

Why rewrite history (locally)

  • Make history reflect logical, reviewable changes
    • reorder commits
    • rewrite messages
    • modify commit contents
    • squash/split commits
    • remove commits entirely
  • Cardinal rule
    • dont push until youre happy
    • rewriting pushed history confuses collaborators (treat pushed as final unless strong reason)

Changing the last commit

  • Amend message and/or content
    • git commit --amend
  • Common patterns
    • fix message only: amend, edit message in editor
    • fix content:
      • edit files → stage changes → git commit --amend
  • Caution
    • amending changes SHA-1 (like small rebase)
    • dont amend a commit thats already pushed
  • Tip: avoid editor if message unchanged
    • git commit --amend --no-edit
  • Note: commit message may need updating if content changes substantially

Changing multiple commit messages (interactive rebase)

  • Tool: interactive rebase
    • git rebase -i <upstream>
  • Choosing the range
    • specify the parent of the oldest commit you want to edit
    • example for last 3 commits: git rebase -i HEAD~3
  • Warning
    • rewrites every commit in selected range and descendants
    • avoid rewriting commits already pushed
  • Interactive todo list properties
    • commits listed oldest→newest (reverse of typical git log output)
    • Git replays commits top→bottom
  • Todo commands shown
    • pick use commit
    • reword use commit, edit message
    • edit stop for amending
    • squash meld into previous, edit combined message
    • fixup like squash, discard this commit message
    • exec run shell command
    • break stop here, continue later with git rebase --continue
    • drop remove commit
    • label label current HEAD
    • reset reset HEAD to a label
    • merge create merge commit (with options to keep/reword message)
    • notes shown in template:
      • lines can be re-ordered
      • removing a line loses that commit
      • removing everything aborts rebase
      • empty commits commented out

Reordering commits (interactive rebase)

  • Reorder lines in todo file
  • Save + exit
    • Git rewinds branch to parent of the todo range
    • replays commits in new order

Removing commits (interactive rebase)

  • Delete the line or mark it drop
  • Effects
    • rewriting a commit rewrites all following commits SHA-1s
    • can cause conflicts if later commits depend on removed one

Squashing commits

  • Mark subsequent commits as squash (or fixup)
  • Git:
    • applies changes together
    • opens editor to combine messages (except fixup discards message)
  • Outcome
    • a single commit replacing multiple commits

Splitting a commit

  • Mark target commit as edit in rebase todo
  • When rebase stops at that commit
    • undo that commit while keeping changes in working tree/index state
      • git reset HEAD^ (mixed reset)
    • stage and commit portions into multiple commits
    • continue rebase
      • git rebase --continue
  • Reminder
    • rewriting changes SHA-1s of affected commit and subsequent commits
    • avoid if any are pushed

Aborting or recovering

  • Abort in-progress rebase
    • git rebase --abort
  • After completing, recover earlier state
    • use reflog (chapter references this as Data Recovery elsewhere)

The nuclear option: filter-branch

  • Purpose
    • scriptable rewriting across many commits
    • examples:
      • remove file from every commit
      • change email globally
      • rewrite project root from subdirectory
  • Warning callout
    • git filter-branch has many pitfalls; no longer recommended
    • prefer git-filter-repo (Python) for most use cases
  • Common uses shown
    • Remove a file from every commit (e.g., secrets/huge binaries)
      • git filter-branch --tree-filter 'rm -f passwords.txt' HEAD
      • --tree-filter runs command after each checkout; recommits results
      • can use patterns (e.g., rm -f *~)
      • to run across all branches: --all
      • recommended: test in a branch, then hard-reset master if satisfied
    • Make a subdirectory the new root
      • git filter-branch --subdirectory-filter trunk HEAD
      • auto-removes commits that didnt affect the subdirectory
    • Change email addresses globally (only yours)
      • git filter-branch --commit-filter '<script>' HEAD
      • script checks GIT_AUTHOR_EMAIL, rewrites author name/email, calls git commit-tree
      • note: parent SHA-1 changes propagate, rewriting entire history chain

Reset Demystified (reset & checkout mental model)

The Three Trees (collections of files)

  • HEAD
    • last commit snapshot; next parent
    • pointer to current branch ref → last commit on that branch
    • inspect snapshot (plumbing examples shown)
      • git cat-file -p HEAD
      • git ls-tree -r HEAD
  • Index (staging area)
    • proposed next commit snapshot (what git commit uses)
    • inspect index (plumbing)
      • git ls-files -s
    • note: implemented as flattened manifest (not a literal tree), but treated as “tree” conceptually
  • Working Directory
    • sandbox with real files (editable)
    • unpacked from .git storage into filesystem

Typical workflow across the three trees

  • After git init
    • only working directory has content
  • git add
    • copies content working directory → index
  • git commit
    • writes index snapshot → commit
    • moves current branch pointer (HEADs branch)
  • Clean state
    • HEAD == index == working directory
  • Modify file
    • working directory differs from index → “Changes not staged”
  • Stage file
    • index differs from HEAD → “Changes to be committed”
  • Checkout behavior summary (mentioned)
    • git checkout <branch>:
      • moves HEAD to that branch
      • fills index with commit snapshot
      • copies index → working directory

The role of reset (commit-level)

  • Reset manipulates the three trees in order (up to 3 operations)
    1. Move the branch HEAD points to (REF move)
    2. Update index to match new HEAD (--mixed)
    3. Update working directory to match index (--hard)
  • Step 1: move HEADs branch ref
    • always attempted when reset is given a commit
    • --soft stops here
    • resembles undoing the last git commit (ref moves back)
  • Step 2: update index (--mixed, default)
    • index becomes snapshot of new HEAD
    • --mixed stops here
    • resembles undoing git add + git commit
  • Step 3: update working directory (--hard)
    • working directory overwritten to match index
    • this is the dangerous form
      • can destroy uncommitted work
      • other forms are generally recoverable (e.g., via reflog)

Reset with a path (file-level)

  • Behavior change
    • skips step 1 (cant move a ref “partially”)
    • applies index/working-dir updates only for specified paths
  • Common unstage use
    • git reset file.txt
      • shorthand for git reset --mixed HEAD file.txt
      • copies file from HEAD → index (unstages)
      • conceptual opposite of git add file.txt
  • Reset a path to a specific commits version (index only)
    • git reset <commit> file.txt
    • can prepare a commit that reverts a file without checking out old version into working dir
  • Patch mode
    • git reset --patch allows selective unstaging/resetting hunks

Squashing commits with reset

  • Alternative to interactive rebase for simple cases
  • Example flow
    • git reset --soft HEAD~2
    • git commit (creates one commit combining last two commits changes)

Checkout vs Reset

  • Both manipulate the three trees; differences depend on “with paths” or not

Without paths

  • git checkout <branch>
    • similar outcome to git reset --hard <branch> (trees match target)
    • key differences
      • working-directory safe
        • checks + trivial merges; avoids overwriting local changes where possible
      • moves HEAD itself to point to another branch
  • git reset <branch>
    • moves the branch ref HEAD points to (REF move), not HEAD

With paths

  • git checkout [commit] <paths>
    • does not move HEAD
    • updates index and working directory for those paths
    • not working-directory safe (can overwrite local changes)
    • supports --patch for hunk-by-hunk revert

Cheat sheet (which trees each command affects)

  • Commit level
    • reset --soft [commit]
      • HEAD column: REF moves; Index: no; Workdir: no; WD safe: yes
    • reset [commit] (default mixed)
      • REF moves; Index: yes; Workdir: no; WD safe: yes
    • reset --hard [commit]
      • REF moves; Index: yes; Workdir: yes; WD safe: NO
    • checkout <commit>
      • HEAD moves; Index: yes; Workdir: yes; WD safe: yes
  • File level
    • reset [commit] <paths>
      • HEAD: no; Index: yes; Workdir: no; WD safe: yes
    • checkout [commit] <paths>
      • HEAD: no; Index: yes; Workdir: yes; WD safe: NO

Advanced Merging

Git merge philosophy and practical guidance

  • Git often makes merging easy, enabling long-lived branches with frequent merges
    • resolve small conflicts often instead of huge conflicts later
  • Git avoids “overly clever” auto-resolution
    • if ambiguous, it stops and asks you to resolve
  • Best practice before merges that might conflict
    • start with a clean working directory
    • otherwise commit to temp branch or stash

Merge conflicts: tools and strategies

Aborting a merge

  • If you dont want to deal with conflicts yet
    • git merge --abort
    • returns to pre-merge state (unless WIP changes complicate)
  • “Start over” option (dangerous)
    • git reset --hard HEAD (loses uncommitted work)

Ignoring whitespace during merge

  • If conflicts are largely whitespace-related
    • re-run merge with strategy options
      • git merge -Xignore-all-space <branch>
      • git merge -Xignore-space-change <branch>
  • Practical benefit
    • resolves merges where only formatting/line endings differed

Manual file re-merging (scriptable fixes)

  • Use case
    • Git cant auto-handle some transformations (e.g., normalize line endings)
  • Concept
    • extract three versions of the conflicted file from index stages
      • stage 1: base/common ancestor
      • stage 2: ours
      • stage 3: theirs (MERGE_HEAD)
  • Extract versions
    • git show :1:<file> > <file>.common
    • git show :2:<file> > <file>.ours
    • git show :3:<file> > <file>.theirs
  • Inspect blob SHAs in index
    • git ls-files -u
  • Preprocess + merge single file
    • preprocess one side (example shown: dos2unix on theirs)
    • merge with git merge-file -p ours common theirs > <file>
  • Compare result vs each side (helpful review)
    • git diff --ours
    • git diff --theirs -b (strip whitespace for Git-stored version comparisons)
    • git diff --base -b
  • Cleanup temp artifacts
    • git clean -f

Checking out conflicts / marker styles / choosing sides

  • Re-checkout file with conflict markers
    • git checkout --conflict=merge <file> (default style)
    • git checkout --conflict=diff3 <file> (adds inline base section)
  • Make diff3 default
    • git config --global merge.conflictstyle diff3
  • Quickly choose one side for a file
    • git checkout --ours <file>
    • git checkout --theirs <file>
    • useful for binary files or “take one side” decisions

Merge log (find what contributed to conflicts)

  • Show unique commits from both sides of merge
    • git log --oneline --left-right HEAD...MERGE_HEAD
  • Show only commits that touch currently conflicted file(s)
    • git log --oneline --left-right --merge
    • add -p to view diffs of the conflicted file(s)

Combined diff format

  • During unresolved merge conflicts
    • git diff shows “combined diff” (diff --cc)
    • two columns indicate differences vs ours and vs theirs
  • After resolving conflict
    • combined diff highlights:
      • what was removed from ours
      • what was removed from theirs
      • what resolution introduced
  • Review after the fact
    • git show <merge_commit> shows combined diff for merge
    • git log --cc -p includes combined diffs in log output

Undoing merges

  • Scenario: accidental merge commit

Option 1: fix references (rewrite history)

  • If unwanted merge exists only locally
    • git reset --hard HEAD~
  • Downside
    • rewrites history (problematic if others have the commits)
    • wont work safely if other commits happened after the merge (would lose them)

Option 2: reverse the merge commit (revert)

  • Create a new commit that undoes changes introduced by merge
    • git revert -m 1 HEAD
  • -m 1 (mainline parent selection)
    • keep parent #1 (current branchs line)
    • undo parent #2s introduced changes
  • Important consequence
    • history still contains original merged commits
    • merging that branch again may say “Already up-to-date”
    • later merges may only bring changes since reverted merge
  • Fix when you actually want to re-merge later
    • “un-revert” the revert commit (git revert ^M as shown conceptually)
    • then merge again to bring full changes

Other types of merges

“Ours” / “Theirs” preference (recursive strategy option)

  • Use when conflicts should default to one side
    • git merge -Xours <branch>
    • git merge -Xtheirs <branch>
  • Behavior
    • still merges non-conflicting changes normally
    • for conflicts, chooses the specified side entirely (including binaries)
  • Similar capability at file-merge level
    • git merge-file --ours ... (noted)

“ours” merge strategy (-s ours) (fake merge)

  • Different from -Xours
  • Command
    • git merge -s ours <branch>
  • Behavior
    • records merge commit with both parents
    • result tree equals current branch (ignores merged-in branch content)
  • Use case
    • mark work as merged to avoid conflicts later (e.g., backport workflows)

Subtree merging

  • Problem solved
    • one project is a subdirectory of another
  • Example workflow (shown)
    • add other project as remote; fetch
    • checkout remote branch into local branch (e.g., rack_branch)
    • import that branch into subdirectory of main project
      • git read-tree --prefix=<dir>/ -u <branch>
    • merge upstream changes back into main project subtree
      • git merge --squash -s recursive -Xsubtree=<dir> <branch>
  • Notes / tradeoffs (explicitly discussed)
    • avoids submodules; all code in one repo
    • more complex; easier to make reintegration mistakes; risk of pushing unrelated branches
  • Diffing subtree vs a branch
    • use git diff-tree -p <branch> (not plain git diff)

Rerere (reuse recorded resolution)

  • Meaning: “reuse recorded resolution”
  • Value
    • remembers how you resolved a conflict hunk
    • next time the same conflict appears, resolves automatically
  • Useful scenarios cited
    • long-lived topic branches: repeated merges without keeping intermediate merge commits
    • frequent rebases: avoid re-resolving same conflicts repeatedly
    • test-merge many evolving branches: redo merges without re-resolving
  • Enable rerere
    • git config --global rerere.enabled true
    • (alternative: create .git/rr-cache directory per repo)
  • During a conflict with rerere enabled
    • message appears: Recorded preimage for '<file>'
  • Inspect rerere data
    • git rerere status (files recorded)
    • git rerere diff (preimage vs resolved state)
  • After resolving and committing
    • message: Recorded resolution for '<file>'.
  • Reuse in later conflict (merge/rebase)
    • message: Resolved '<file>' using previous resolution.
    • file may already be clean (markers removed)
    • can recreate conflict markers for inspection
      • git checkout --conflict=merge <file>
    • can reapply cached resolution explicitly
      • git rerere

Debugging with Git

File annotation (git blame)

  • When you know “where” the bug is, but not “when” it appeared
  • Command
    • git blame <file>
    • restrict range with -L <start>,<end>
  • Output fields explained
    • short SHA-1 of commit that last modified each line
    • author name + authored date
    • line number + line content
  • Special ^ prefix in blame output
    • indicates line originated in initial commit and never changed
  • Track code movement/copies
    • git blame -C tries to find where code was copied from
    • can show original file/commit for copied snippets (not just when copied)

Binary search for bug introduction (git bisect)

  • Purpose
    • find the first bad commit via binary search
  • Basic workflow
    • start: git bisect start
    • mark current as bad: git bisect bad
    • mark last known good: git bisect good <good_commit> (example uses v1.0)
    • Git checks out midpoint; you test; mark good or bad
    • repeat until Git identifies first bad commit
  • Output when finished
    • indicates first bad commit SHA-1 + commit info + changed paths
  • Clean up
    • git bisect reset (return to original HEAD)
  • Automation
    • specify range directly: git bisect start <bad> <good>
    • run script that returns 0 for good, non-0 for bad
      • git bisect run <test-script>

Submodules

Motivation and concept

  • Need to use another project inside yours while keeping it separate
  • Tradeoffs of alternatives
    • shared library install: hard to customize; deployment complexity
    • copying source: hard to merge upstream changes
  • Submodules
    • allow a Git repository as a subdirectory of another
    • superproject records a specific subproject commit

Starting with submodules

  • Add submodule
    • git submodule add <url> [path]
    • default path = repo name
  • What changes in superproject
    • .gitmodules file created (version-controlled)
      • maps submodule.<name>.path and submodule.<name>.url
    • submodule directory entry staged as a special Git mode
      • mode 160000 (records a commit as a directory entry)
    • diff behavior
      • git diff --cached <submodule> shows “Subproject commit ”
      • nicer: git diff --cached --submodule
  • Commit and push as normal (superproject now pins a submodule commit)
  • URL accessibility note
    • .gitmodules URL is what others use to clone/fetch
    • choose a URL others can access
    • you can override locally via git config submodule.<name>.url <PRIVATE_URL>
    • relative URLs can help in some setups

Cloning a repo with submodules

  • Default clone behavior
    • submodule directories exist but are empty (no files)
  • Initialize and update
    • git submodule init
    • git submodule update
  • One-step clone
    • git clone --recurse-submodules <url>
  • If you already cloned
    • combine init+update: git submodule update --init
    • include nested submodules: git submodule update --init --recursive

Working with submodules

Pulling upstream changes from submodule remote (consumer model)

  • Manual inside submodule
    • git fetch
    • git merge origin/<branch>
  • Show submodule changes from superproject
    • git diff --submodule
    • set default diff format:
      • git config --global diff.submodule log
  • Auto-update from superproject
    • git submodule update --remote [<submodule>]
    • default branch tracked: submodules master unless configured otherwise
  • Track a different branch (e.g., stable)
    • store for everyone: edit .gitmodules
      • git config -f .gitmodules submodule.<name>.branch stable
    • then git submodule update --remote
  • Status improvements
    • git status shows submodule “modified (new commits)”
    • git config status.submodulesummary 1 shows brief summary

Pulling upstream changes from superproject remote (collaborator model)

  • git pull
    • fetches superproject commits
    • also fetches submodule objects (as shown)
    • but does NOT update submodule working directories by default
  • Symptoms
    • git status shows submodule modified with “new commits”
    • arrows in summary may indicate expected commits not checked out locally
  • Fix
    • git submodule update --init --recursive
  • Automate
    • git pull --recurse-submodules (Git ≥ 2.14)
    • default recursion for supported commands: git config submodule.recurse true (pull recursion since Git 2.15)
  • Special case: upstream changed submodule URL in .gitmodules
    • remedy:
      • git submodule sync --recursive
      • git submodule update --init --recursive

Working on a submodule (active development)

  • Detached HEAD default issue
    • git submodule update often leaves submodule in detached HEAD
    • local commits risk being “orphaned” by future updates
  • Make it hackable
    • enter submodule and checkout a branch
      • git checkout <branch> (e.g., stable)
  • Updating while you have local work
    • merge upstream into your local branch
      • git submodule update --remote --merge
    • or rebase local changes
      • git submodule update --remote --rebase
    • if you forget --merge/--rebase
      • Git updates submodule checkout and may leave you detached again
  • Safety behaviors
    • if local changes would be overwritten: update aborts and tells you to commit/stash
    • conflicts during --merge update are resolved inside the submodule like normal merges

Publishing submodule changes

  • Problem
    • pushing superproject that references submodule commits not available on any remote breaks others
  • Push options
    • check mode (fail if submodules not pushed)
      • git push --recurse-submodules=check
      • default config: git config push.recurseSubmodules check
    • on-demand mode (push submodules first automatically)
      • git push --recurse-submodules=on-demand
      • default config: git config push.recurseSubmodules on-demand

Merging submodule changes (superproject conflicts)

  • Fast-forward case
    • if one submodule commit is ancestor of the other, Git chooses the newer (works)
  • Divergent case
    • Git does not trivial-merge submodule histories for you
    • conflict example shown: CONFLICT (submodule): Merge conflict in <submodule>
  • Diagnose SHAs
    • git diff on the superproject shows both submodule commit IDs
  • Manual resolution flow (shown)
    • enter submodule
    • create a branch pointing to the other sides SHA (e.g., try-merge)
    • merge it, resolve conflicts, commit in submodule
    • return to superproject
    • git add <submodule> to record resolved submodule pointer
    • commit superproject merge
  • Alternative case: Git suggests an existing submodule merge commit
    • it may print a “possible merge resolution” SHA and a suggested git update-index --cacheinfo 160000 <sha> <path>
    • recommended approach still: verify in submodule, fast-forward/merge, then git add + commit

Submodule tips

  • Run commands in each submodule
    • git submodule foreach '<cmd>'
    • examples shown
      • stash across all: git submodule foreach 'git stash'
      • create branch across all: git submodule foreach 'git checkout -b <branch>'
      • unified diffs: git diff; git submodule foreach 'git diff'
  • Useful aliases (examples)
    • sdiff = diff superproject + each submodule diff
    • spush = push with --recurse-submodules=on-demand
    • supdate = submodule update --remote --merge

Issues with submodules

  • Switching branches (older Git < 2.13)
    • switching to branch without submodule leaves submodule directory as untracked
    • cleanup needed: git clean -ffdx
    • switching back requires git submodule update --init
  • Newer Git (≥ 2.13)
    • git checkout --recurse-submodules <branch> keeps submodules consistent when switching
    • you can default recursion: git config submodule.recurse true
  • Switching from subdirectories to submodules
    • if a directory is already tracked, git submodule add fails (“already exists in the index”)
    • fix: git rm -r <dir> first, then git submodule add ...
    • switching back to branch where files are tracked (not submodule) can fail due to untracked files overwrite risk
      • can force with git checkout -f (danger: overwrites unsaved changes)
    • may end with empty submodule directory; may need inside submodule: git checkout .
  • Storage note (modern Git)
    • submodule Git data stored in superprojects .git directory
    • deleting submodule working directory wont lose commits/branches

Bundling (git bundle)

  • Purpose
    • transfer Git data without network protocols (HTTP/SSH)
  • Use cases
    • no network
    • offsite/security constraints
    • broken networking hardware
    • email/USB transfer
  • Create bundle
    • git bundle create <file.bundle> <ref_or_range>...
    • must list each reference/range to include
    • to be cloneable, include HEAD plus branch (example: HEAD master)
  • Clone from bundle
    • git clone <bundle> <dir>
    • if HEAD not included, may need -b <branch> to choose checkout branch
  • Incremental bundles (send only new commits)
    • you must compute the range manually (unlike network push)
    • range examples used
      • origin/master..master
      • master ^origin/master
    • create incremental bundle example pattern
      • git bundle create <bundle> master ^<known_base_commit>
  • Inspect / validate bundles
    • verify bundle and prerequisites
      • git bundle verify <bundle>
    • list heads
      • git bundle list-heads <bundle>
  • Import
    • fetch from bundle to a local branch
      • git fetch <bundle> <bundleBranch>:<localBranch>
    • inspect graph with git log --graph --all

Replace (git replace)

  • Core idea
    • Git objects are immutable, but replace lets Git pretend object A is object B
    • “when you refer to X, use Y instead”
  • Common use
    • replace a commit without rewriting entire history (vs filter-branch)
    • graft histories together (short “recent” history + longer “historical” history)

Example: grafting history without rewriting all SHA-1s

  • Split repository into:
    • historical repo (commits 1→4)
    • truncated recent repo (commits 4→5 + an “instructions” base commit)
  • Tools used (as shown)
    • create historical branch and push to another remote
    • truncate recent history by creating a parentless base commit with plumbing
      • git commit-tree <commit>^{tree} (creates new commit from a tree)
    • rebase onto that base commit
      • git rebase --onto <newBase> <splitPointCommit>
  • Recombine in a clone
    • fetch both remotes
    • git replace <recent_fourth_sha> <historical_fourth_sha>
  • Effects/notes
    • git log shows full history
    • SHA displayed remains the original (the one being “replaced”), but content comes from replacement
    • git cat-file -p <old> shows replaced data (including different parent)
    • replacement stored as a ref
      • refs/replace/<oldsha>
    • can share by pushing that ref

Credential Storage

Problem being solved

  • SSH can use keys (possibly no passphrase) → no repeated prompts
  • HTTP always needs username/password
    • 2FA tokens make passwords harder to type/manage

Built-in credential helper approaches

  • No caching (default)
    • prompts every connection
  • cache
    • stores credentials in memory
    • not written to disk
    • purges after timeout (default 15 minutes / 900s)
  • store
    • writes credentials to plain-text file (default ~/.git-credentials)
    • never expires
    • downside: cleartext password on disk
  • macOS keychain helper (osxkeychain)
    • stores encrypted in system keychain; persists
  • Git Credential Manager (Windows/macOS/Linux)
    • uses platform-native secure stores

Configuration

  • Set helper
    • git config --global credential.helper <helper>
  • Helper options
    • store file location
      • git config --global credential.helper 'store --file <path>'
    • cache timeout
      • git config --global credential.helper 'cache --timeout <seconds>'
  • Multiple helpers
    • Git queries helpers in order until one returns credentials
    • When saving, Git sends creds to all helpers (each decides what to do)
  • Example .gitconfig pattern shown
    • thumbdrive store + memory cache fallback

Under the hood (git credential)

  • Gits root credential command
    • git credential <action>
    • communicates via stdin/stdout key-value protocol
  • Example action explained: fill
    • Git provides what it knows (e.g., protocol, host)
    • blank line ends input
    • credential system outputs what it found (username/password)
    • if unknown, Git prompts user and outputs what user entered
  • How helpers are invoked (forms)
    • foo → runs git-credential-foo
    • foo -a --opt=bcd → runs git-credential-foo -a --opt=bcd
    • /absolute/path/foo -xyz → runs that program
    • !<shell> → executes shell code
  • Helper action set (slightly different terms)
    • get request credentials
    • store save credentials
    • erase remove credentials
  • Output rules
    • for get: helper may output additional key=value lines (overriding existing)
    • for store/erase: output ignored
  • git-credential-store example shown
    • store: git credential-store --file <file> store
    • get: git credential-store --file <file> get
    • file format: credential-decorated URL per line
      • https://user:pass@host

Custom credential helper example: read-only shared store

  • Use case described
    • team-shared credentials in shared directory
    • dont want to copy to personal credential store
    • credentials change often
  • Requirements (as listed)
    • only handle get; ignore store/erase
    • read git-credential-store-compatible file format
    • allow configurable path (--file)
  • Implementation outline shown (Ruby)
    • parse options
    • exit unless action is get and file exists
    • read stdin key=value pairs until blank line
    • scan credential file; match on protocol/host/username
    • output protocol/host/username/password if found
  • Configure with helper short name
    • git config --global credential.helper 'read-only --file <shared_path>'

Chapter Summary

  • You now have advanced tools to:
    • select commits and ranges precisely
    • stage/commit partial changes interactively
    • temporarily shelve work (stash) and safely remove untracked artifacts (clean)
    • sign and verify tags/commits with GPG, and optionally enforce signed merges
    • search code and history efficiently (grep, log pickaxe/regex, line history)
    • rewrite local history confidently (amend, interactive rebase; filter-branch caveats)
    • understand reset and checkout via the three-tree model
    • handle complex merges (whitespace strategies, manual merges, combined diffs, undo merges, subtree merges, rerere)
    • debug regressions (blame, bisect, automated bisect runs)
    • manage nested dependencies with submodules (setup, update, push safety, conflicts, tips, caveats)
    • transfer Git data offline (bundles)
    • “graft” history with virtual object replacement (replace)
    • manage credentials with helpers (including writing your own)