Files
mapas-mentales/mindmap/Git and Other Systems.md

44 KiB
Raw Blame History

# Git and Other Systems (Chapter 7)
## Chapter purpose / big picture
- Reality check: you can't always switch every project to Git immediately
- Two major goals
  - Use Git locally while the “official” repository lives in another VCS (Git as a client)
  - Migrate/convert an existing repository from another VCS into Git (Migrating to Git)
- Key idea: “bridges/adapters” let Git interoperate with centralized or other DVCS systems
- Recurring caveat theme throughout
  - Different VCS have different data models (linear history vs merge history, tags/branches semantics, etc.)
  - Bridges often require constraints (e.g., keep history linear, avoid rewriting)

## Part 1 — Git as a Client (working with non-Git servers)
### What “bridges” enable
- Keep Gits local UX (branching, merging, staging, rebase, cherry-pick, etc.)
- Collaborators can keep using their existing VCS server + client tools
- Often useful as an incremental adoption path (“sneak Git in”)

### Git and Subversion (SVN) — `git svn`
#### Background: why SVN matters
- Widely used in open source + corporate environments
- Longstanding “default” centralized VCS for many projects
- Similar lineage to CVS
- SVN constraints that influence workflows
  - Centralized, linear, single “official” history
  - Merges recorded differently than in Git (and often more limited)

#### Bridge overview: `git svn`
- Bidirectional bridge to an SVN server
- Lets you
  - Work locally with Git features (branches, merges, staging, rebase, cherry-pick)
  - Publish work back to SVN as if using SVN client
- Practical role
  - Helps teams gain Git productivity without server migration
  - Often a stepping stone (“gateway drug” to DVCS)

#### Mental model + rules of thumb (critical differences from pure Git)
- You are interacting with Subversion, not Git
- Best practices to avoid confusion
  - Keep history as linear as possible
    - Prefer rebasing over merging
    - Avoid merge commits in publishable history
  - Avoid simultaneously collaborating via a Git remote repository
    - Dont push to a parallel Git server and SVN at the same time
  - Dont rewrite history after publishing to SVN, then try to push again
- Team coordination guideline
  - If some devs use SVN clients and others use `git svn`, everyone should collaborate via the SVN server (single source of truth)

#### `git svn` command family (entry point)
- Base command: `git svn`
  - Provides many subcommands
  - Common ones shown through workflows

#### Setting up an SVN repo for examples (local writable mirror)
- Need an SVN repository with write access
- Tool used: `svnsync` (ships with Subversion)
- Create a new local SVN repository
  - `mkdir /tmp/test-svn`
  - `svnadmin create /tmp/test-svn`
- Enable changing revprops (revision properties)
  - Add hook: `/tmp/test-svn/hooks/pre-revprop-change`
    - Content:
      - `#!/bin/sh`
      - `exit 0;`
    - Make executable: `chmod +x /tmp/test-svn/hooks/pre-revprop-change`
- Initialize sync metadata
  - `svnsync init file:///tmp/test-svn http://your-svn-server.example.org/svn/`
- Sync revisions into the local mirror
  - `svnsync sync file:///tmp/test-svn`
  - Notes
    - Copies one revision at a time
    - Very inefficient (but simplest approach)
    - Remote-to-remote sync can take a long time even for smallish histories

#### Getting started: importing SVN into a Git repo
- Clone/import SVN repository
  - Full layout options:
    - `git svn clone file:///tmp/test-svn -T trunk -b branches -t tags`
  - Standard layout shorthand:
    - `git svn clone file:///tmp/test-svn -s`
- What this does under the hood
  - Equivalent to:
    - `git svn init` then `git svn fetch`
  - Performance note
    - Git must check out each SVN revision sequentially and commit it
    - 100s/1000s of commits can take hours or days
- Layout flags meaning
  - `-T trunk` → trunk directory name
  - `-b branches` → branches directory name
  - `-t tags` → tags directory name
  - `-s` → “standard layout” (implies all of the above)
  - Customize if SVN repo uses nonstandard paths

#### Resulting refs: branches/tags as seen in Git
- Inspect imported refs
  - `git branch -a`
  - `git show-ref`
- Important nuance: SVN tags handled as remote refs
  - `git svn` imports SVN tags as remote refs under:
    - `refs/remotes/origin/tags/...`
  - Contrast: native Git clone stores tags directly under:
    - `refs/tags/...`
- Practical implication
  - Youll often want post-import cleanup if migrating permanently (covered later)

#### Committing back to Subversion
- Local Git commit
  - Example: `git commit -am 'Adding git-svn instructions to the README'`
- Publish to SVN
  - `git svn dcommit`
- What `dcommit` does (key behavior)
  - Takes each local commit atop SVNs tip and commits it to SVN one-by-one
  - Rewrites your local Git commits after publishing
    - Adds a `git-svn-id` line to each commit message
    - Changes SHA-1s for the commits (history rewritten locally)
- Consequence: “SVN first” if dual-publishing
  - If you must push to both SVN and a Git server:
    - `dcommit` to SVN first, then push to Git
    - Because `dcommit` changes commit data

#### Pulling in new changes (keeping in sync with SVN)
- Symptom: `dcommit` rejected because SVN has advanced
  - Error example: “Transaction is out of date”
- Resolution: rebase against SVN
  - `git svn rebase`
    - Fetches changes from SVN you dont have yet
    - Rebases your local commits on top of updated SVN tip
    - May involve conflict resolution
- After rebase
  - `git svn dcommit` should succeed
- Behavior difference vs Git server
  - Git requires integrating upstream before push (always)
  - `git svn` makes you integrate only when conflicts occur (SVN-like)
    - Non-conflicting edits in different files may still allow `dcommit`
    - But `git svn` may still perform a rebase internally
- Critical caveat: published state may be “untestable” locally
  - Because SVN accepts sequential commits without requiring a full pre-tested merged state
  - Resulting repo state may not have existed on any client machine
  - Can yield subtle incompatibilities
- Keeping updated routinely
  - Prefer `git svn rebase` periodically
    - Does fetch + updates your branch
  - Working directory must be clean
    - Stash or temporarily commit local changes before rebasing

#### Git branching issues when SVN is the server
- Git encourages topic branches + merges
- With `git svn`, prefer rebasing topic work onto mainline
  - Why
    - SVN has linear history and doesnt model merges like Git
    - `git svn` conversion follows only the first parent when turning Git history into SVN commits
- If you `dcommit` a merged history
  - `dcommit` will succeed, but…
  - Only the merge commit gets rewritten; the original topic-branch commits wont appear individually in SVN history
  - Others cloning will see a “squashed” result
    - Similar to `git merge --squash`
    - Lose detailed commit provenance/timing from topic branch

#### Subversion branching with `git svn`
##### Creating a new SVN branch
- Command: `git svn branch <new-branch>`
  - Example: `git svn branch opera`
- What it does
  - Equivalent to `svn copy trunk branches/opera`
  - Operates on the SVN server
- Common gotcha
  - It does NOT switch your working directory to the new branch
  - If you commit now, you still commit to SVN trunk (not the new branch)

##### Switching active branches / targeting `dcommit`
- How `dcommit` decides where to commit
  - Looks for tip of an SVN branch (git-svn-id) in your history
  - Assumption: there should be only one, and it should be the last git-svn-id in your current branch history
- Working on multiple SVN branches simultaneously (Git-side strategy)
  - Create local Git branches rooted at the corresponding imported SVN refs
  - Example:
    - `git branch opera remotes/origin/opera`

##### Merging SVN branches using Git
- You can merge locally with `git merge`
  - Example: merge `opera` into trunk (master)
- Provide a meaningful merge commit message
  - Use `-m` to avoid generic “Merge branch opera”
- After `dcommit`
  - SVN cant store true merge-parent info
  - `dcommit` will squash merge history into a single commit in SVN
  - Merge ancestry info is erased → future merge-base calculations in Git become wrong
- Practical workaround / best practice
  - After merging a feature branch into trunk and `dcommit`ing:
    - delete the local feature branch (e.g., `opera`)
    - avoids later incorrect merges / confusion

#### SVN-like helper commands provided by `git svn`
##### SVN-style history
- Command: `git svn log`
- Properties
  - Runs offline (unlike `svn log` which queries server)
  - Shows only commits that have been committed to SVN (dcommitted)
  - Does not show:
    - local Git-only commits (not yet dcommitted)
    - new SVN commits created since last communication
  - Best thought of as “last known SVN commit state”

##### SVN annotation / blame
- Command: `git svn blame <file>`
  - Equivalent to `svn annotate`
- Same limitations as `git svn log`
  - Offline
  - Only includes commits known as of last SVN interaction

##### SVN server information
- Command: `git svn info`
  - Equivalent to `svn info`
- Offline + last-known-state behavior

##### Ignoring what SVN ignores
- Problem
  - SVN ignores are often stored as `svn:ignore` properties
  - Git users want equivalent ignore behavior to avoid accidentally committing ignored files
- Tools
  - `git svn create-ignore`
    - Creates corresponding `.gitignore` files in working tree
    - Intended to be committed on next commit (if desired)
  - `git svn show-ignore`
    - Prints ignore rules (stdout)
    - Useful to keep ignores local-only:
      - `git svn show-ignore > .git/info/exclude`
        - Avoids committing `.gitignore` files
        - Useful if youre the only Git user and teammates dont want `.gitignore` artifacts in SVN repo

#### GitSVN summary (what to remember)
- `git svn` is valuable when SVN server is unavoidable
- Treat it as “crippled Git”
  - Many Git workflows dont translate cleanly to SVNs linear model
- Safe-operating guidelines (to avoid confusing SVN / teammates)
  - Keep a linear Git history; avoid merge commits
    - Rebase topic work onto mainline; dont merge it
  - Dont collaborate using a parallel Git server
    - If you use a Git server for faster clones:
      - dont push commits lacking `git-svn-id`
      - consider a pre-receive hook to reject commits without `git-svn-id`
- If possible: migrate to a real Git server for full benefits

### Git and Mercurial (Hg) — `git-remote-hg`
#### Context
- DVCS ecosystem includes Git + others; Mercurial is most popular non-Git DVCS
- Git and Mercurial are conceptually similar → interoperability is relatively smooth

#### Bridge overview: remote helper `git-remote-hg`
- Project: https://github.com/felipec/git-remote-hg
- Implemented as a Git “remote helper”
  - Same general mechanism used by Gits HTTP/S remote support
- Benefit
  - Use standard Git commands (`clone`, `fetch`, `push`) against an Hg-backed remote

#### Installation checklist
- Install helper script into PATH
  - `curl -o ~/bin/git-remote-hg https://raw.githubusercontent.com/felipec/git-remote-hg/master/git-remote-hg`
  - `chmod +x ~/bin/git-remote-hg`
- Python dependency
  - Mercurial library for Python:
    - `pip install mercurial`
  - If Python not installed: install from https://www.python.org/
- Mercurial client
  - Install from https://www.mercurial-scm.org/

#### Getting started (example repository)
- Prepare Mercurial “server-side” repo (any Hg repo can be pushed to)
  - Example: hello world repo
    - `hg clone http://selenic.com/repo/hello /tmp/hello`
- Clone using Git (Hg remote helper prefix)
  - `git clone hg::/tmp/hello /tmp/hello-git`
- Verify history
  - `git log --oneline --graph --decorate`
  - You may see many refs displayed; helper creates multiple refs to represent Hg concepts

#### Under-the-hood mapping (how Git refs represent Hg concepts)
- Inspect actual refs on disk
  - `tree .git/refs`
- Key internal namespaces created by helper
  - `refs/hg/...`
    - Holds the “real” remote refs managed by helper
    - Separates:
      - Mercurial branches (e.g., `refs/hg/origin/branches/default`)
      - Mercurial bookmarks (e.g., `refs/hg/origin/bookmarks/master`)
  - `refs/notes/hg` (or `.git/notes/hg`)
    - Stores mapping between Git commit hashes and Mercurial changeset IDs
    - Implemented using Git notes (tree of mappings)
    - Concept
      - Key: Git commit SHA-1
      - Value: Mercurial changeset ID
- Practical takeaway
  - Most users can ignore these implementation details during normal workflows

#### Ignoring files (Hg ↔ Git)
- Goal
  - Respect Mercurial ignore rules locally without committing `.gitignore` to an Hg project
- Approach
  - Copy Hg ignore file into Gits local-only exclude file
    - `cp .hgignore .git/info/exclude`
  - Why it works
    - `.git/info/exclude` behaves like `.gitignore` but is not committed
    - Hg ignore format is compatible enough for this simple copy in the example

#### Typical workflow (clone → commit → fetch/merge → push)
- Local work and commits on `master`
  - Example log: local commits ahead of `origin/master`
- Check for remote changes
  - `git fetch`
  - May advance `origin/master` (from Hg changes made by others)
- Handle divergence
  - Mercurial supports merges, so you can do a normal Git merge:
    - `git merge origin/master`
- Share work
  - `git push`
- Verify on Mercurial side
  - `hg log -G --style compact`
  - Result
    - Hg changesets created from Git commits appear in Hg history (including merges)

#### Branches and bookmarks (concept mapping and operations)
- Conceptual differences
  - Git: one kind of branch (moving ref)
  - Mercurial: two related concepts
    - Bookmark: moving pointer (like Git branch)
    - Branch (heavyweight): branch name stored in each changeset; permanently part of history
- Why helper must care
  - Git can represent both with refs, but Mercurials semantics differ

##### Creating Mercurial bookmarks via Git branches
- Git side
  - `git checkout -b featureA`
  - `git push origin featureA`
- Mercurial side
  - `hg bookmarks` shows bookmark `featureA`
  - Hg log shows `[featureA]` annotation on appropriate revision
- Limitation
  - Bookmark deletion not supported from Git side (remote helper limitation)

##### Working with Mercurial heavyweight branches via Git
- Create branch in Git under the `branches/` namespace
  - `git checkout -b branches/permanent`
  - commit changes
  - `git push origin branches/permanent`
- Mercurial side
  - `hg branches` shows `permanent` with tip changeset
  - `hg log -G` shows:
    - `branch: permanent` recorded in the changeset itself

##### History rewriting warning (Hg is append-only)
- Mercurial generally does not support rewriting published history; it adds new changesets instead
- If you do interactive rebase + force-push from Git
  - New changesets are created
  - Old changesets remain in repo history
- Risk
  - Can be very confusing to Mercurial users
- Guidance
  - Avoid rewriting history that has left your machine

#### Mercurial summary
- Working across Git/Hg boundary is typically low-friction
- If you avoid rewriting shared history, you may barely notice the remote is Mercurial

### Git and Bazaar (bzr) — `git-remote-bzr`
#### Context
- Bazaar (GNU Project) is a DVCS but behaves differently from Git
  - Different keywords for similar operations
  - Some common Git terms differ in meaning
  - Branch management is notably different → potential confusion for Git users
- Still possible to work on Bazaar repos from Git with a remote helper

#### Bridge overview: remote helper `git-remote-bzr`
- Project: https://github.com/felipec/git-remote-bzr
- Enables `git clone`/`fetch`/`push` against Bazaar repositories

#### Installation checklist
- Install helper script into PATH
  - `wget https://raw.github.com/felipec/git-remote-bzr/master/git-remote-bzr -O ~/bin/git-remote-bzr`
  - `chmod +x ~/bin/git-remote-bzr`
- Install Bazaar client (`bzr`)

#### Creating a Git repository from a Bazaar repository
- Clone using `bzr::` prefix
- Recommendation
  - Dont attach Git clone to a *local* Bazaar clone
    - even though both are full clones
  - Prefer attaching Git clone directly to the *central* Bazaar repository
- Example
  - Remote: `bzr+ssh://developer@mybazaarserver:myproject`
  - Git clone:
    - `git clone bzr::bzr+ssh://developer@mybazaarserver:myproject myProject-Git`
    - `cd myProject-Git`
- Post-clone optimization (disk compaction)
  - `git gc --aggressive`
  - Especially helpful for big repositories

#### Bazaar branches and cloning behavior
- Bazaar allows cloning branches; a repository may contain multiple branches
- `git-remote-bzr` can clone:
  - A specific branch
    - `git clone bzr::bzr://bzr.savannah.gnu.org/emacs/trunk emacs-trunk`
  - All branches in a repository
    - `git clone bzr::bzr://bzr.savannah.gnu.org/emacs emacs`
- Fetch only selected branches
  - Configure:
    - `git config remote-bzr.branches 'trunk, xwindow'`
- When remote repo does not allow listing branches
  - Manually specify branch list and fetch
    - `git init emacs`
    - `git remote add origin bzr::bzr://bzr.savannah.gnu.org/emacs`
    - `git config remote-bzr.branches 'trunk, xwindow'`
    - `git fetch`

#### Ignoring files (Bazaar `.bzrignore` ↔ Git ignores)
- Core concern
  - You shouldnt create/commit `.gitignore` into a Bazaar-managed project
    - Could disturb Bazaar users
- Solution
  - Use `.git/info/exclude` (local-only ignores)
  - Implement as:
    - symbolic link to `.bzrignore`, or
    - regular file that mirrors `.bzrignore`
- Bazaar ignore features beyond Git
  - `!!` prefix
    - ignore patterns even if re-included by a later `!` rule
  - `RE:` prefix
    - Python regular expression pattern (Git supports only glob patterns)
- Two cases
  - Case A: `.bzrignore` has no `!!` and no `RE:` lines
    - Safe to symlink:
      - `ln -s .bzrignore .git/info/exclude`
  - Case B: `.bzrignore` contains `!!` and/or `RE:`
    - Must create/edit `.git/info/exclude` manually to match ignore behavior
- Ongoing maintenance warning
  - Must monitor changes to `.bzrignore`
    - If `.bzrignore` changes to include unsupported syntax:
      - remove symlink (if used)
      - copy `.bzrignore` into `.git/info/exclude`
      - adapt patterns
- Git exclusion caveat
  - In Git, if a parent directory is excluded, you cannot later re-include a file inside it
  - Be careful translating Bazaar ignore semantics

#### Fetching from Bazaar remote (Git-side)
- Use normal Git commands
- Example (if working on `master`)
  - `git pull --rebase origin`
  - Merge/rebase your work onto `origin/master`

#### Pushing to Bazaar remote (Git-side)
- Bazaar supports merge commits
  - Pushing merge commits is acceptable
- Typical flow
  - work on branches
  - merge into `master`
  - push:
    - `git push origin master`

#### Caveats (remote-helper limitations)
- Some push operations arent supported / behave unexpectedly
  - Branch deletion:
    - `git push origin :branch-to-delete` (doesnt work)
  - Refspec rename:
    - `git push origin old:new` (pushes `old`)
  - Dry-run:
    - `git push --dry-run origin branch` (will push anyway)

#### Bazaar summary
- Bazaar and Git are similar enough for reasonable interoperability
- Key to success
  - Know the remote isnt native Git
  - Respect remote-helper limitations

### Git and Perforce
#### Context
- Perforce (1995) — oldest VCS covered in chapter
- Designed for constraints of its era
  - Central server, always connected assumption
  - Only one version stored locally
- Still widely used in corporate settings
- Two ways to mix Git with Perforce
  - Git Fusion (server-side)
  - git-p4 (client-side)

#### Option 1: Perforce Git Fusion (server-side bridge)
##### Overview
- Product by Perforce: Git Fusion
  - http://www.perforce.com/git-fusion
- Synchronizes Perforce server with Git repositories on server side
- Exposes Perforce depot subtrees as read-write Git repos

##### Setting up Git Fusion (example: Perforce-provided VM)
- Installation method used in chapter
  - Download virtual machine image with Perforce daemon + Git Fusion
    - http://www.perforce.com/downloads/Perforce/20-User
  - Import into virtualization software (VirtualBox in example)
- First boot configuration prompts
  - Set passwords for Linux users:
    - `root`, `perforce`, `git`
  - Provide instance name (distinguish installations on same network)
- Note VM IP address (needed for cloning over HTTPS)
- Create a Perforce user (as root on VM)
  - `p4 -p localhost:1666 -u super user -f john`
    - Opens editor (VI); accept defaults with `:wq`
  - `p4 -p localhost:1666 -u john passwd`
    - Enter password twice
  - `exit`
- SSL certificate workaround for example
  - VM certificate doesnt match IP → Git rejects HTTPS
  - Temporary bypass:
    - `export GIT_SSL_NO_VERIFY=true`
  - For real installs: install correct certificate per Git Fusion manual
- Test clone of sample repo (Talkhouse)
  - `git clone https://<IP>/Talkhouse`
  - Prompts for credentials (john)
  - Credential cache helps subsequent commands
- Figure reference
  - Figure 145: Git Fusion virtual machine boot screen (shows IP)

##### Git Fusion configuration (via Perforce client)
- Configuration lives in Perforce depot path
  - `//.git-fusion` directory
- Map `//.git-fusion` into a Perforce workspace and browse/edit
- Directory structure (high level)
  - `objects/`
    - `repos/` and `trees/` (internal object mapping; usually dont edit)
  - global `p4gf_config`
  - per-repo config: `repos/<RepoName>/p4gf_config`
  - user mapping: `users/p4gf_usermap`
- Global `p4gf_config` characteristics
  - INI-style text file
  - Global defaults; can be overridden by repo-specific configs
  - Key sections shown (examples)
    - `[repo-creation] charset = utf8`
    - `[git-to-perforce]`
      - `change-owner = author`
      - `enable-git-branch-creation = yes`
      - `enable-swarm-reviews = yes`
      - `enable-git-merge-commits = yes`
      - `enable-git-submodules = yes`
      - `preflight-commit = none`
      - `ignore-author-permissions = no`
      - `read-permission-check = none`
      - `git-merge-avoidance-after-change-num = 12107`
    - `[perforce-to-git]` (`http-url`, `ssh-url`)
    - `[@features]` feature flags (imports, chunked-push, matrix2, parallel-push)
    - `[authentication] email-case-sensitivity = no`
- Repo-specific `p4gf_config`
  - Contains `[@repo]` section with per-repo overrides
  - Contains Perforce-branch ↔ Git-branch mappings via named sections

##### Branch mapping and view mappings (Git Fusion)
- Mapping section example
  - `[Talkhouse-master]`
    - `git-branch-name = master`
    - `view = //depot/Talkhouse/main-dev/... ...`
- Purpose of settings
  - `git-branch-name`
    - Choose friendlier Git branch names (avoid awkward Perforce paths)
  - `view`
    - Defines how Perforce files map into the Git repository
    - Uses standard Perforce view mapping syntax
- Multi-project mapping example
  - One Git branch can combine multiple Perforce depots/subtrees into subdirectories
  - Example view:
    - `//depot/project1/main/... project1/...`
    - `//depot/project2/mainline/... project2/...`

##### User identity mapping (Git Fusion: `users/p4gf_usermap`)
- Purpose
  - Map Perforce users to Git author identities (and vice versa)
- Default mapping behavior (without usermap)
  - Perforce → Git
    - Look up Perforce user; use stored full name + email in Git commit
  - Git → Perforce
    - Look up Perforce user by email in Git commit author field
    - Submit changeset as that Perforce user (permissions apply)
- Mapping file line format
  - `<user> <email> "<full name>"`
- Use cases
  - Multiple emails mapping to one Perforce account
    - Supports commits authored under different emails but attributed to same Perforce user
  - Anonymization / masking internal directory
    - Replace real names/emails with fictional/anonymous ones in exported Git commits
- Matching behavior detail
  - When creating Git commit from Perforce changeset:
    - first matching line for Perforce user supplies Git author info
  - Uniqueness recommendation
    - email + full name should be unique unless intentionally collapsing attribution

##### Workflow with Git Fusion (from the Git side)
- Clone a Git Fusion repository (example Jam)
  - `git clone https://<IP>/Jam`
- Initial clone behavior
  - Git Fusion converts applicable Perforce changesets → Git commits on server
  - Takes time proportional to history size
  - Later fetches are incremental and feel more native-speed
- Result feels like a normal Git repo
  - Typical refs:
    - `master`
    - `origin/master`, `origin/rel2.1`, etc.
- Standard Git workflow applies
  - Make commits locally
  - `git fetch` to update remote-tracking branches
  - `git merge origin/master` to integrate updates
  - `git push` to publish back
- Push mechanics (visible output)
  - Git Fusion runs conversion back into Perforce:
    - loads commit tree
    - finds child commits
    - runs `git fast-export`
    - checks commits
    - copies changelists
    - submits new Git commit objects to Perforce
  - Note: processing may continue even if connection closes
- Perforce-side visualization
  - p4v revision graph shows merge structure akin to Git
  - If Perforce lacks a named branch for Git-side commits
    - Git Fusion creates an “anonymous” branch under `.git-fusion` to hold them
  - Figure reference
    - Figure 146: Perforce revision graph resulting from Git push

##### Git Fusion summary
- Advantages
  - First-class interoperability when server admin can install it
  - Supports many “full Git” features comfortably
    - merge commits → recorded as Perforce integrations
    - submodules (though may look odd to Perforce users)
- Limitations
  - Will reject rewriting history that has already been pushed
- If Git Fusion not possible
  - Use client-side `git-p4`

#### Option 2: `git-p4` (client-side Perforce bridge)
##### Overview
- Two-way bridge between Git and Perforce
- Runs entirely inside your Git repository
  - No special Perforce server configuration required
- Less flexible/comprehensive than Git Fusion
  - But “good enough” for many workflows

##### Prerequisites / notes
- Requires `p4` CLI tool in your PATH
  - Free download (as referenced in chapter):
    - http://www.perforce.com/downloads/Perforce/20-User
- Must set environment variables for Perforce connection (example)
  - `export P4PORT=10.0.1.254:1666`
  - `export P4USER=john`

##### Getting started: cloning from Perforce
- Command
  - `git p4 clone //depot/www/live www-shallow`
- Result characteristics
  - “Shallow” import by default
    - imports only latest Perforce revision (`#head`)
    - aligns with Perforces “not everyone has all history” model
  - Git view after clone
    - local `master`
    - Perforce state refs:
      - `p4/master`
      - `p4/HEAD`
- Important nuance: no Git remotes created
  - `git remote -v` → no remotes
  - Perforce state is represented as refs, not a Git-managed remote

##### Workflow: sync, rebase, submit
- Local development
  - commit locally on `master`
- Get latest from Perforce
  - `git p4 sync`
    - incremental import into `refs/remotes/p4/master`
- Keep history linear before submitting
  - divergence between `master` and `p4/master` is possible
  - recommended: rebase local commits on top of Perforce head
  - shortcut:
    - `git p4 rebase`
      - effectively: `git p4 sync` + `git rebase p4/master`
      - (with extra smarts for multi-branch situations)
- Submit work back to Perforce
  - `git p4 submit`
    - creates a Perforce changelist per Git commit between `p4/master` and `master`
    - opens editor for each changelist specification
    - imports Git commit message into Perforce change description
    - includes diff content for context
- Authorship mismatch warning (during submit)
  - If Git author email doesnt match your Perforce account:
    - message suggests:
      - `--preserve-user` to modify authorship
      - set `git-p4.skipUserNameCheck` to hide warning
- After submit completes
  - git-p4 performs another incremental import
  - rebases current branch onto `p4/master`
  - effect resembles a `git push` workflow
- Commit rewriting
  - Submitted commits SHA-1 hashes change
  - git-p4 appends metadata line to commit message, e.g.:
    - `[git-p4: depot-paths = "//depot/www/live/": change = 12144]`
- Squashing strategy
  - To combine multiple Git commits into one Perforce changeset:
    - interactive rebase (squash) before `git p4 submit`

##### What about merge commits?
- Perforce branching model differs; merge commits arent meaningful in Perforce changelist history
- `git p4 submit` behavior with merge commits
  - ignores merge commits
  - applies only the non-merge commits that arent in Perforce yet
- Net effect
  - history becomes linear on submission (as though you rebased)
- Practical implication
  - You can branch and merge freely in Git locally
  - As long as you can rebase/linearize before submitting
- Caveat
  - Perforce integration metadata (branch lineage) is not preserved; only file-level changes are recorded

##### Branching with `git-p4`
- Example Perforce depot layout
  - `//depot/project/main`
  - `//depot/project/dev`
- Example Perforce branch spec view
  - `//depot/project/main/... //depot/project/dev/...`
- Clone with branch detection
  - `git p4 clone --detect-branches //depot/project@all`
    - `@all` imports all changesets that ever touched those paths (full history)
    - imports additional branches (e.g., `project/dev`)
    - updates branches list (e.g., `main dev`)
- When Perforce branch specs arent present
  - Configure branch relationships manually
    - `git init project`
    - `git config git-p4.branchList main:dev`
      - declares `main` and `dev`; `dev` is child of `main`
    - `git clone --detect-branches //depot/project@all .`
- Working with detected branches
  - Create local branch from Perforce branch ref
    - `git checkout -b dev p4/project/dev`
  - `git p4 submit` targets correct Perforce branch automatically
- Limitations and operational constraints
  - Cannot mix shallow clones with multiple branches
  - For huge projects needing multiple submit targets
    - may need one `git p4 clone` per branch to submit to
  - Branch creation/integration must be done with Perforce tools
    - git-p4 can only sync/submit to existing branches
    - can only submit one linear changeset at a time
    - merge/integration metadata is lost if merging in Git

##### Git + Perforce summary
- `git-p4` enables Git-style local workflow with Perforce as server-of-record
- Be careful about sharing Git commits
  - Dont push commits to shared Git remotes unless already submitted to Perforce
- If possible and approved by admin
  - Git Fusion provides more seamless, first-class integration

## Part 2 — Migrating to Git (converting repositories into native Git)
### Why migrate
- Adopt Git as primary VCS for an existing codebase
- Goals
  - Preserve history as much as possible
  - Clean up author/branch/tag data during conversion
- Strategy
  - Use system-specific importers when available
  - Otherwise use `git fast-import` with a custom converter

### Migrating from Subversion (SVN)
#### Simple path (but imperfect)
- Use `git svn clone` to import
- Stop using SVN and push resulting Git repo to a new Git server
- Caveat
  - Import can be imperfect; takes long anyway → worth doing a cleaner import

#### Author mapping (SVN usernames → Git identities)
- Problem
  - SVN records commit “author” as a username on the SVN system
  - Git prefers full identity: `Full Name <email>`
- Create `users.txt` mapping file
  - Format:
    - `svnuser = Full Name <email>`
  - Example:
    - `schacon = Scott Chacon <schacon@geemail.com>`
    - `selse = Someo Nelse <selse@geemail.com>`
- Generate initial list of SVN author names
  - `svn log --xml --quiet | grep author | sort -u | perl -pe 's/.*>(.*?)<.*/$1 = /'`
  - Then redirect output into `users.txt` and fill in names/emails
- Windows note
  - Migration steps may require special tooling; referenced guidance:
    - https://docs.microsoft.com/en-us/azure/devops/repos/git/perform-migration-from-svn-to-git

#### Cleaner `git svn clone` for migration
- Recommended command pattern
  - `git svn clone http://my-project.googlecode.com/svn/ \`
    - `--authors-file=users.txt`
    - `--no-metadata`
    - `--prefix ""`
    - `-s`
    - `my_project`
- Option rationale
  - `--authors-file`
    - improves Author field quality in Git commits
  - `--no-metadata`
    - removes `git-svn-id` lines in commit messages (cleaner logs)
    - WARNING: keep metadata if you intend to mirror back to original SVN repo
  - `--prefix ""`
    - avoids extra ref prefixes from import
  - `-s`
    - assumes standard SVN trunk/branches/tags layout

#### Post-import cleanup (make imported refs idiomatic Git)
- Convert SVN tags (remote refs) into real Git tags
  - Problem
    - `git svn` stores tags as remote refs under `refs/remotes/tags/...`
  - Conversion loop (creates lightweight tags and deletes remote tag refs)
    - `for t in $(git for-each-ref --format='%(refname:short)' refs/remotes/tags); do`
      - `git tag ${t/tags\//} $t && git branch -D -r $t;`
      - `done`
- Convert remaining remote refs into local branches
  - `for b in $(git for-each-ref --format='%(refname:short)' refs/remotes); do`
    - `git branch $b refs/remotes/$b && git branch -D -r $b;`
    - `done`
- Remove peg-revision branches (optional cleanup)
  - Symptom
    - extra branches suffixed with `@<number>` (SVN “peg-revisions”)
  - If you dont need them:
    - `for p in $(git for-each-ref --format='%(refname:short)' | grep @); do`
      - `git branch -D $p;`
      - `done`
- Remove redundant `trunk` branch
  - `git svn` often creates `trunk` ref that points where `master` points
  - Remove:
    - `git branch -d trunk`

#### Push migrated repo to Git server
- Add remote
  - `git remote add origin git@my-git-server:myrepository.git`
- Push all branches
  - `git push origin --all`
- Push tags
  - `git push origin --tags`

### Migrating from Mercurial (Hg)
#### Why its straightforward
- Git and Mercurial data models are similar
- Git is flexible in representing refs/tags

#### Tool: `hg-fast-export`
- Acquire tool
  - `git clone https://github.com/frej/fast-export.git`

#### Steps
- Full clone the Mercurial repo to convert
  - `hg clone <remote repo URL> /tmp/hg-repo`
- Create author mapping file (optional cleanup but often necessary)
  - Generate list:
    - `cd /tmp/hg-repo`
    - `hg log | grep user: | sort | uniq | sed 's/user: *//' > ../authors`
  - Convert each line into rule syntax:
    - `"<input>"="<output>"`
  - Notes
    - Mercurial allows looser author strings than Git
    - Mapping file can normalize duplicates, fix invalid formats
    - Supports Python `string_escape` sequences in mapping strings
    - Unmatched inputs pass through unchanged
  - Also usable to rename branches/tags if Mercurial names invalid in Git
    - Branch mapping: `-B`
    - Tag mapping: `-T`
- Create a new Git repository and run export
  - `git init /tmp/converted`
  - `cd /tmp/converted`
  - `/tmp/fast-export/hg-fast-export.sh -r /tmp/hg-repo -A /tmp/authors`
- Output expectations
  - Exporter reports per-revision progress and file deltas
  - Mercurial tags exported to Git tags
  - Mercurial branches/bookmarks become Git branches
  - Ends with `git-fast-import` statistics
- Validate author consolidation
  - `git shortlog -sn`
- Publish to Git server
  - `git remote add origin git@my-git-server:myrepository.git`
  - `git push origin --all`

### Migrating from Bazaar (bzr)
#### Tooling: Bazaar fast-export → Git fast-import
- Requires `bzr-fastimport` plugin (and Python module dependencies)

#### Install `bzr-fastimport`
- Linux/Unix-like (preferred: package manager)
  - Debian/Ubuntu:
    - `sudo apt-get install bzr-fastimport`
  - RHEL:
    - `sudo yum install bzr-fastimport`
  - Fedora 22+:
    - `sudo dnf install bzr-fastimport`
- If package unavailable: install plugin manually
  - `mkdir --parents ~/.bazaar/plugins`
  - `cd ~/.bazaar/plugins`
  - `bzr branch lp:bzr-fastimport fastimport`
  - `cd fastimport`
  - `sudo python setup.py install --record=files.txt`
- Ensure Python module `fastimport` is present
  - Check:
    - `python -c "import fastimport"`
  - If missing:
    - `pip install fastimport`
  - Source:
    - https://pypi.python.org/pypi/fastimport/
- Windows
  - Standalone/default Bazaar install includes `bzr-fastimport` (no extra steps)

#### Import scenarios
##### Single-branch Bazaar project
- `cd /path/to/the/bzr/repository`
- Initialize Git
  - `git init`
- Export + import
  - `bzr fast-export --plain . | git fast-import`
- Expected time
  - seconds to minutes depending on repo size

##### Bazaar repository with multiple branches (main + working branch)
- Example branch directories
  - `myProject.trunk` (main)
  - `myProject.work` (working branch)
- Create Git repo
  - `git init git-repo`
  - `cd git-repo`
- Import trunk as Git master (with marks)
  - `bzr fast-export --export-marks=../marks.bzr ../myProject.trunk | \`
    - `git fast-import --export-marks=../marks.git`
- Import work branch as Git branch `work` (reusing marks)
  - `bzr fast-export --marks=../marks.bzr --git-branch=work ../myProject.work | \`
    - `git fast-import --import-marks=../marks.git --export-marks=../marks.git`
- Verify
  - `git branch` should show `master` and `work`
  - Inspect logs
  - Remove mark files (`marks.bzr`, `marks.git`) after confirmation

#### Synchronize working directory + index after import
- Issue
  - staging area may not match HEAD
  - working directory may not match HEAD after multi-branch import
- Fix
  - `git reset --hard HEAD`

#### Convert ignore rules (.bzrignore → .gitignore)
- Rename ignore file
  - `git mv .bzrignore .gitignore`
- If `.bzrignore` uses Bazaar-only constructs (`!!`, `RE:`)
  - modify `.gitignore` (possibly multiple `.gitignore` files) to match behavior
- Commit this conversion as part of migration
  - `git commit -am 'Migration from Bazaar to Git'`

#### Publish to Git server
- `git remote add origin git@my-git-server:mygitrepository.git`
- `git push origin --all`
- `git push origin --tags`

### Migrating from Perforce
#### Approach A: Perforce Git Fusion
- Configure project, branches, and user mappings in Git Fusion
- Clone Git Fusion repo (appears native Git)
- Push to a native Git host if desired
- Optionally, Perforce (via Git Fusion) can continue to host Git repos

#### Approach B: `git-p4` as an import tool
- Example: import Jam from Perforce Public Depot
- Set Perforce server
  - `export P4PORT=public.perforce.com:1666`
- Import full history of subtree (`@all`)
  - `git-p4 clone //guest/perforce_software/jam@all p4import`
- Branches
  - Use `--detect-branches` if you want multiple branches (when available/configured)
- Inspect imported history
  - `git log`
  - Commits include Perforce change marker line:
    - `[git-p4: depot-paths = "...": change = N]`
- Optional cleanup: remove git-p4 marker lines (do this before new work)
  - `git filter-branch --msg-filter 'sed -e "/^\[git-p4:/d"'`
  - Effect
    - rewrites commit history; SHA-1 hashes change
- Publish to new Git server (after cleanup/verification)

### A custom importer (when no prebuilt tool exists) — `git fast-import`
#### When to use
- No quality importer exists for your legacy VCS or storage format
- You need customized mapping/cleanup beyond available tools

#### Why `git fast-import`
- Accepts a simple, line-oriented instruction stream on stdin
- Efficiently creates Git objects (blobs/trees/commits/refs/tags)
- Much easier than
  - invoking raw/plumbing commands per object, or
  - writing raw Git objects directly

#### Example data source: timestamped directory backups
- Source directory structure
  - `back_YYYY_MM_DD/` (snapshots)
  - `current/` (latest snapshot)
- Goal
  - Import each snapshot as a commit in a linear history
  - Each commit represents full tree state at that snapshot

#### Git storage reminder (mapping problem to solution)
- Git history is a linked list (DAG) of commit objects
- Each commit points to a snapshot (tree)
- So importer must emit
  - tree content for each snapshot
  - commit metadata + parent linkage
  - order of commits

#### Strategy for the example importer
- Walk snapshot directories in order
- For each snapshot:
  - create a new commit
  - link it to previous commit (parent)
  - wipe tree (`deleteall`) and re-add all files (full snapshot approach)
- Notes
  - fast-import also supports delta-style imports (add/modify/delete only), but thats more complex

#### Ruby implementation (key pieces)
- Language choice
  - Ruby used for readability and convenience
  - Any language works if it can output proper fast-import stream
- Windows newline caution
  - `git fast-import` expects LF (not CRLF)
  - Ruby fix:
    - `$stdout.binmode`

##### Main loop (iterate snapshots)
- Pseudocode shape
  - `last_mark = nil`
  - `Dir.chdir(ARGV[0]) do`
    - `Dir.glob("*").each do |dir|`
      - `next if File.file?(dir)`
      - `Dir.chdir(dir) do`
        - `last_mark = print_export(dir, last_mark)`
      - `end`
    - `end`
  - `end`

##### Marks (fast-import commit identifiers)
- Definition
  - “mark” is an integer ID used to reference commits within fast-import stream
- Implementation: map directory names to sequential integers
  - Global: `$marks = []`
  - `convert_dir_to_mark(dir)`
    - add dir to `$marks` if not already present
    - return `($marks.index(dir) + 1).to_s`

##### Dates (commit timestamps from directory names)
- Need integer timestamp for committer line
- `convert_dir_to_date(dir)`
  - if `dir == 'current'` → `Time.now().to_i`
  - else
    - strip prefix `back_`
    - parse `year, month, day`
    - use `Time.local(year, month, day).to_i`

##### Author/committer identity
- Hardcoded for example
  - `$author = 'John Doe <john@example.com>'`

##### Fast-import commit record structure (what gets printed)
- For each snapshot commit:
  - `commit refs/heads/master`
  - `mark :<mark>`
  - `committer <author> <timestamp> -0700`
    - timezone hardcoded as `-0700` in example
  - commit message via `data` directive:
    - `"imported from <dir>"`
  - parent link (except first commit):
    - `from :<last_mark>`
  - tree content:
    - `deleteall`
    - for each file: `M <mode> inline <path>` + inline `data` (file content)

##### Helper: exporting data blocks (`data <size>\n<content>`)
- Used for both
  - commit messages
  - file contents
- `export_data(string)`
  - prints:
    - `data #{string.size}\n#{string}`

##### Helper: writing a file blob inline
- `inline_data(file, code = 'M', mode = '644')`
  - `content = File.read(file)`
  - `puts "#{code} #{mode} inline #{file}"`
  - `export_data(content)`
- Mode notes
  - `644` for normal files
  - must detect executables and use `755` when needed

##### `print_export(dir, last_mark)` responsibilities
- Compute metadata
  - `date = convert_dir_to_date(dir)`
  - `mark = convert_dir_to_mark(dir)`
- Print commit header + metadata + message
- Print parent link if present
- Print `deleteall`
- Walk all files in snapshot
  - `Dir.glob("**/*")`
  - `next if !File.file?(file)`
  - `inline_data(file)`
- Return `mark` to become next iterations `last_mark`

##### Full script structure (as presented)
- Shebang
  - `#!/usr/bin/env ruby`
- Windows newline fix
  - `$stdout.binmode`
- Globals
  - `$author = "John Doe <john@example.com>"`
  - `$marks = []`
- Functions
  - `convert_dir_to_mark`
  - `convert_dir_to_date`
  - `export_data`
  - `inline_data`
  - `print_export`
- Main loop (iterates snapshot directories, updating `last_mark`)

#### Running the importer
- Create target Git repo
  - `git init`
- Pipe importer output into `git fast-import`
  - `ruby import.rb /opt/import_from | git fast-import`
- Successful run yields
  - `git-fast-import statistics` summary (objects, branches, marks, memory, etc.)
- Verify commit history
  - `git log`
- Working tree behavior
  - After import, nothing is checked out by default
  - Populate working directory:
    - `git reset --hard master`

#### Extending beyond the example
- `git fast-import` can handle
  - file mode changes (e.g., executable bits)
  - binary data
  - multiple branches
  - merges
  - tags
  - progress indicators
- Reference
  - examples in Git source: `contrib/fast-import/`

## Chapter wrap-up (Summary)
- You can use Git effectively even when the central system is not Git
  - via bridges/remote helpers (`git svn`, `git-remote-hg`, `git-remote-bzr`, Git Fusion, `git-p4`)
- You can migrate repositories from common VCS into native Git
  - SVN, Mercurial, Bazaar, Perforce
  - plus custom sources via `git fast-import`
- Next step (as hinted in chapter)
  - understanding Git internals enables even more precise control over repository data