# Git Tools ## Purpose / Context - You already know day-to-day Git workflows - track + commit files - staging area - topic branching + merging - This chapter: powerful/advanced tools you might not use every day, but will eventually need ## Revision Selection - Git can refer to: - a single commit - a set of commits - a range of commits - References can be: - hashes (full/short) - branch names - reflog entries - ancestry expressions - range expressions ### Single Revisions - Full SHA-1 - 40-character commit hash (e.g., from `git log`) - Short SHA-1 (abbreviated hash) - Git accepts a prefix of the SHA-1 if: - at least 4 characters - unambiguous among all objects in the object database - Inspect a commit (examples; any unique prefix works) - `git show ` - `git show ` - Generate abbreviated commits in log output - `git log --abbrev-commit --pretty=oneline` - defaults to 7 characters; lengthens as needed to remain unique - Practical uniqueness - often 8–10 chars enough within a repo - example note: very large repos still have unique prefixes (Linux kernel cited) - Note: SHA-1 collision concerns (and Git’s direction) - SHA-1 digest: 20 bytes / 160 bits - Random collisions are astronomically unlikely - 50% collision probability requires about 2^80 randomly-hashed objects - probability formula cited: `p = (n(n-1)/2) * (1/2^160)` - If a collision happened organically: - Git would reuse the first object with that hash (you’d always get first object’s data) - Deliberate, synthesized collisions are possible (e.g., shattered.io, Feb 2017) - Git is moving toward SHA-256 as the default hash algorithm - more resilient to collision attacks - mitigation code exists, but cannot fully eliminate attacks - Branch References - If a commit is the tip of a branch, you can refer to it by branch name - `git show ` - equivalent to `git show ` - Plumbing tool to resolve refs → SHA-1: `git rev-parse` - example: `git rev-parse topic1` - purpose: lower-level operations (not typical day-to-day), but useful for “what is this ref really?” - Reflog Shortnames - Git records a reflog (local history of where HEAD/refs have pointed) - View reflog - `git reflog` - shows entries like `HEAD@{0}`, `HEAD@{1}`, … - Refer to older values - `git show HEAD@{5}` (the 5th prior HEAD value in reflog) - Time-based reflog syntax - `git show master@{yesterday}` - Log-format reflog output - `git log -g ` (e.g., `git log -g master`) - Important properties / limitations - reflog is **strictly local** - not shared; differs from other clones - freshly cloned repo starts with empty reflog (no local activity yet) - retention is limited (typically a few months) - time lookups only work while data remains in reflog - Mental model - reflog ≈ “shell history” for Git refs (personal/session-local) - PowerShell gotcha: escaping braces `{ }` - `git show HEAD@{0}` (won’t work) - `git show HEAD@`{0`}` (OK) - `git show "HEAD@{0}"` (OK) - Ancestry References - Caret `^` (parent selection) - `ref^` = parent of `ref` - example: `HEAD^` = parent of HEAD - Windows cmd.exe gotcha: escaping `^` - `git show "HEAD^"` or `git show HEAD^^` - Selecting merge parents - `ref^2` = second parent (merge commits only) - first parent: branch you were on when merging (often `master`) - second parent: branch being merged in (topic branch) - Tilde `~` (first-parent traversal) - `ref~` ≡ `ref^` (first parent) - `ref~2` = first-parent-of-first-parent (grandparent) - repeated tildes: `HEAD~~~` ≡ `HEAD~3` - Combining ancestry operators - example: `HEAD~3^2` = second parent of the commit found via `HEAD~3` (if that commit is a merge) ### Commit Ranges - Motivation / questions answered - “What work is on this branch that hasn’t been merged into main?” - “What am I about to push?” - “What’s unique between two lines of development?” #### Double Dot (`A..B`) - Meaning - commits reachable from `B` **but not** reachable from `A` - Example uses - “what’s in experiment not in master?” - `git log master..experiment` - opposite direction (what’s in master not in experiment) - `git log experiment..master` - “what am I about to push?” - `git log origin/master..HEAD` - Omitted side defaults to `HEAD` - `git log origin/master..` ≡ `git log origin/master..HEAD` #### Multiple Points (`^` / `--not`) - Double-dot is shorthand for a common two-point case - Equivalent forms - `git log refA..refB` - `git log ^refA refB` - `git log refB --not refA` - Advantage: can exclude multiple refs - “reachable from refA or refB, but not from refC” - `git log refA refB ^refC` - `git log refA refB --not refC` #### Triple Dot (`A...B`) - Meaning (symmetric difference) - commits reachable from either `A` or `B` **but not both** - Example - `git log master...experiment` - Often paired with `--left-right` - `git log --left-right master...experiment` - marks which side each commit is from (`<` vs `>`) ## Interactive Staging - Goal - craft commits that contain only certain combinations/parts of changes - split large messy changes into focused, reviewable commits ### Interactive add mode - Start - `git add -i` / `git add --interactive` - What it shows - staged vs unstaged changes per path (like `git status`, but compact) - Core commands menu (as shown) - `s` status - `u` update (stage files) - `r` revert (unstage files) - `a` add untracked - `p` patch (stage hunks) - `d` diff (review staged diff) - `q` quit - `h` help ### Staging and unstaging files (interactive) - Stage files - `u` / `update` - select by numbers (comma-separated) - `*` indicates selected items - press Enter with nothing selected to stage all selected - Unstage files - `r` / `revert` - select paths to remove from index - Review staged diff - `d` / `diff` - select file(s) to see - comparable to `git diff --cached` ### Staging patches (partial-file staging) - Enter patch selection - from interactive prompt: `p` / `patch` - from command line: `git add -p` / `git add --patch` - Git presents hunks and asks whether to stage each - Hunk prompt options (as listed) - `y` stage this hunk - `n` do not stage this hunk - `a` stage this and all remaining hunks in file - `d` do not stage this hunk nor any remaining hunks in file - `g` select a hunk to go to - `/` search for a hunk matching a regex - `j` leave this hunk undecided, go to next undecided hunk - `J` leave this hunk undecided, go to next hunk - `k` leave this hunk undecided, go to previous undecided hunk - `K` leave this hunk undecided, go to previous hunk - `s` split current hunk into smaller hunks - `e` manually edit the current hunk - `?` help - Result - a file can be partially staged (some staged, some unstaged) - exit and `git commit` will commit staged parts only - Patch mode appears in other commands too - `git reset --patch` (partial unstage/reset) - `git checkout --patch` (partial checkout/revert) - `git stash save --patch` (stash parts; mentioned as further detail later) ## Stashing and Cleaning ### Stash: why and what it does - Problem - need to switch branches while work is half-done - don’t want to commit unfinished work - `git stash` saves: - modified tracked files (working directory) - staged changes (index) - Stores changes on a stack; can reapply later (even on different branch) - Note: migration to `git stash push` - `git stash save` discussed as being deprecated in favor of `git stash push` - key reason: `push` supports stashing selected pathspecs ### Stashing your work (basic flow) - Observe dirty state - `git status` shows staged + unstaged changes - Create stash - `git stash` or `git stash push` - working directory becomes clean - List stashes - `git stash list` (e.g., `stash@{0}`, `stash@{1}`, …) - Apply stash - most recent: `git stash apply` - specific: `git stash apply stash@{2}` - can apply on different branch - conflicts possible if changes don’t apply cleanly - Restore staged state too - `git stash apply --index` - Remove stashes - drop by name: `git stash drop stash@{0}` - apply + drop: `git stash pop` ### Creative stashing (useful options) - Keep staged changes in index - `git stash --keep-index` - stashes everything else, but leaves index intact - Include untracked files - `git stash -u` / `git stash --include-untracked` - Include ignored files too - `git stash --all` / `git stash -a` - Patch stashing (stash some hunks, keep others) - `git stash --patch` - interactive hunk selection (prompt options include `y/n/q/a/d//e/?` per stash prompt) ### Create a branch from a stash - Use case - stash is old; applying on current branch causes conflicts - Command - `git stash branch ` - Behavior - creates a new branch at the commit you were on when stashing - checks it out - reapplies stash there - drops stash if it applies successfully ### Cleaning your working directory (`git clean`) - Purpose - remove untracked files/dirs (“cruft”) - remove build artifacts for clean build - Caution - removes files not tracked by Git - often no way to recover - safer alternative when unsure: `git stash --all` - Common usage - preview only: `git clean -n` / `git clean --dry-run` - remove untracked files + empty dirs: - `git clean -f -d` - `-f` required unless `clean.requireForce=false` - Ignored files - default: ignored files are NOT removed - remove ignored too: `git clean -x` - Interactive cleaning - `git clean -x -i` - interactive commands shown: - clean - filter by pattern - select by numbers - ask each - quit - help - Quirk (nested Git repos) - directories containing other Git repos may require extra force - may need a second `-f` (e.g., `git clean -ffd`) ## Signing Your Work (GPG) - Git is cryptographically secure (hashing), but not foolproof for trust - When consuming work from others, signing helps verify authorship/integrity ### GPG setup - List keys: `gpg --list-keys` - Generate key: `gpg --gen-key` - Configure Git signing key - `git config --global user.signingkey ` ### Signing tags - Create signed tag - `git tag -s -m ''` (instead of `-a`) - View signature - `git show ` - Passphrase may be required to unlock key ### Verifying tags - Verify signed tag - `git tag -v ` - Requires signer’s public key in your keyring - otherwise: “public key not found” / cannot verify ### Signing commits - Sign a commit (Git v1.7.9+) - `git commit -S ...` - View/check signatures - `git log --show-signature -1` - signature status in custom format: `git log --pretty="format:%h %G? %aN %s"` - example statuses shown in chapter: - `G` = good/valid signature - `N` = no signature ### Enforcing signatures in merges/pulls (Git v1.8.3+) - Verify signatures during merge/pull - `git merge --verify-signatures ` - merge fails if commits are unsigned/untrusted - Verify + sign resulting merge commit - `git merge --verify-signatures -S ` ### Workflow consideration: everyone must sign - If you require signing: - ensure all contributors know how to do it - otherwise you’ll spend time helping rewrite commits to signed versions - Understand GPG + benefits before adopting as standard workflow ## Searching ### `git grep` (search code) - Search targets - working directory (default) - committed trees - index (staging area) - Useful options - line numbers: `-n` / `--line-number` - per-file match counts: `-c` / `--count` - show enclosing function: `-p` / `--show-function` - Complex queries - combine expressions on same line with `--and` - multiple `-e ` expressions - can search historical trees (example in chapter uses tag `v1.8.0`) - output readability helpers: `--break`, `--heading` - Advantages vs external tools (grep/ack) - very fast - can search any Git tree, not just current checkout ### `git log` searching (by content) - Find when a string was introduced/changed (diff-based search) - Pickaxe (`-S`) - `git log -S ` - shows commits that changed number of occurrences of the string - Regex diff search (`-G`) - `git log -G ` ### Line history search (`git log -L`) - Show history of a function/line range as patches - Function syntax - `git log -L ::` - Regex/range alternatives if function parsing fails - regex + end pattern: `git log -L '//',/^}/:` - explicit line ranges or a single line number also supported (noted) ## Rewriting History ### Why rewrite history (locally) - Make history reflect logical, reviewable changes - reorder commits - rewrite messages - modify commit contents - squash/split commits - remove commits entirely - Cardinal rule - don’t push until you’re happy - rewriting pushed history confuses collaborators (treat pushed as final unless strong reason) ### Changing the last commit - Amend message and/or content - `git commit --amend` - Common patterns - fix message only: amend, edit message in editor - fix content: - edit files → stage changes → `git commit --amend` - Caution - amending changes SHA-1 (like small rebase) - don’t amend a commit that’s already pushed - Tip: avoid editor if message unchanged - `git commit --amend --no-edit` - Note: commit message may need updating if content changes substantially ### Changing multiple commit messages (interactive rebase) - Tool: interactive rebase - `git rebase -i ` - Choosing the range - specify the parent of the oldest commit you want to edit - example for last 3 commits: `git rebase -i HEAD~3` - Warning - rewrites every commit in selected range and descendants - avoid rewriting commits already pushed - Interactive todo list properties - commits listed oldest→newest (reverse of typical `git log` output) - Git replays commits top→bottom - Todo commands shown - `pick` use commit - `reword` use commit, edit message - `edit` stop for amending - `squash` meld into previous, edit combined message - `fixup` like squash, discard this commit message - `exec` run shell command - `break` stop here, continue later with `git rebase --continue` - `drop` remove commit - `label` label current HEAD - `reset` reset HEAD to a label - `merge` create merge commit (with options to keep/reword message) - notes shown in template: - lines can be re-ordered - removing a line loses that commit - removing everything aborts rebase - empty commits commented out ### Reordering commits (interactive rebase) - Reorder lines in todo file - Save + exit - Git rewinds branch to parent of the todo range - replays commits in new order ### Removing commits (interactive rebase) - Delete the line or mark it `drop` - Effects - rewriting a commit rewrites all following commits’ SHA-1s - can cause conflicts if later commits depend on removed one ### Squashing commits - Mark subsequent commits as `squash` (or `fixup`) - Git: - applies changes together - opens editor to combine messages (except fixup discards message) - Outcome - a single commit replacing multiple commits ### Splitting a commit - Mark target commit as `edit` in rebase todo - When rebase stops at that commit - undo that commit while keeping changes in working tree/index state - `git reset HEAD^` (mixed reset) - stage and commit portions into multiple commits - continue rebase - `git rebase --continue` - Reminder - rewriting changes SHA-1s of affected commit and subsequent commits - avoid if any are pushed ### Aborting or recovering - Abort in-progress rebase - `git rebase --abort` - After completing, recover earlier state - use reflog (chapter references this as Data Recovery elsewhere) ### The nuclear option: `filter-branch` - Purpose - scriptable rewriting across many commits - examples: - remove file from every commit - change email globally - rewrite project root from subdirectory - Warning callout - `git filter-branch` has many pitfalls; no longer recommended - prefer `git-filter-repo` (Python) for most use cases - Common uses shown - Remove a file from every commit (e.g., secrets/huge binaries) - `git filter-branch --tree-filter 'rm -f passwords.txt' HEAD` - `--tree-filter` runs command after each checkout; recommits results - can use patterns (e.g., `rm -f *~`) - to run across all branches: `--all` - recommended: test in a branch, then hard-reset master if satisfied - Make a subdirectory the new root - `git filter-branch --subdirectory-filter trunk HEAD` - auto-removes commits that didn’t affect the subdirectory - Change email addresses globally (only yours) - `git filter-branch --commit-filter '