44 KiB
44 KiB
# Git and Other Systems (Chapter 7)
## Chapter purpose / big picture
- Reality check: you can't always switch every project to Git immediately
- Two major goals
- Use Git locally while the “official” repository lives in another VCS (Git as a client)
- Migrate/convert an existing repository from another VCS into Git (Migrating to Git)
- Key idea: “bridges/adapters” let Git interoperate with centralized or other DVCS systems
- Recurring caveat theme throughout
- Different VCS have different data models (linear history vs merge history, tags/branches semantics, etc.)
- Bridges often require constraints (e.g., keep history linear, avoid rewriting)
## Part 1 — Git as a Client (working with non-Git servers)
### What “bridges” enable
- Keep Git’s local UX (branching, merging, staging, rebase, cherry-pick, etc.)
- Collaborators can keep using their existing VCS server + client tools
- Often useful as an incremental adoption path (“sneak Git in”)
### Git and Subversion (SVN) — `git svn`
#### Background: why SVN matters
- Widely used in open source + corporate environments
- Longstanding “default” centralized VCS for many projects
- Similar lineage to CVS
- SVN constraints that influence workflows
- Centralized, linear, single “official” history
- Merges recorded differently than in Git (and often more limited)
#### Bridge overview: `git svn`
- Bidirectional bridge to an SVN server
- Lets you
- Work locally with Git features (branches, merges, staging, rebase, cherry-pick)
- Publish work back to SVN as if using SVN client
- Practical role
- Helps teams gain Git productivity without server migration
- Often a stepping stone (“gateway drug” to DVCS)
#### Mental model + rules of thumb (critical differences from pure Git)
- You are interacting with Subversion, not Git
- Best practices to avoid confusion
- Keep history as linear as possible
- Prefer rebasing over merging
- Avoid merge commits in publishable history
- Avoid simultaneously collaborating via a Git remote repository
- Don’t push to a parallel Git server and SVN at the same time
- Don’t rewrite history after publishing to SVN, then try to push again
- Team coordination guideline
- If some devs use SVN clients and others use `git svn`, everyone should collaborate via the SVN server (single source of truth)
#### `git svn` command family (entry point)
- Base command: `git svn`
- Provides many subcommands
- Common ones shown through workflows
#### Setting up an SVN repo for examples (local writable mirror)
- Need an SVN repository with write access
- Tool used: `svnsync` (ships with Subversion)
- Create a new local SVN repository
- `mkdir /tmp/test-svn`
- `svnadmin create /tmp/test-svn`
- Enable changing revprops (revision properties)
- Add hook: `/tmp/test-svn/hooks/pre-revprop-change`
- Content:
- `#!/bin/sh`
- `exit 0;`
- Make executable: `chmod +x /tmp/test-svn/hooks/pre-revprop-change`
- Initialize sync metadata
- `svnsync init file:///tmp/test-svn http://your-svn-server.example.org/svn/`
- Sync revisions into the local mirror
- `svnsync sync file:///tmp/test-svn`
- Notes
- Copies one revision at a time
- Very inefficient (but simplest approach)
- Remote-to-remote sync can take a long time even for smallish histories
#### Getting started: importing SVN into a Git repo
- Clone/import SVN repository
- Full layout options:
- `git svn clone file:///tmp/test-svn -T trunk -b branches -t tags`
- Standard layout shorthand:
- `git svn clone file:///tmp/test-svn -s`
- What this does under the hood
- Equivalent to:
- `git svn init` then `git svn fetch`
- Performance note
- Git must check out each SVN revision sequentially and commit it
- 100s/1000s of commits can take hours or days
- Layout flags meaning
- `-T trunk` → trunk directory name
- `-b branches` → branches directory name
- `-t tags` → tags directory name
- `-s` → “standard layout” (implies all of the above)
- Customize if SVN repo uses nonstandard paths
#### Resulting refs: branches/tags as seen in Git
- Inspect imported refs
- `git branch -a`
- `git show-ref`
- Important nuance: SVN tags handled as remote refs
- `git svn` imports SVN tags as remote refs under:
- `refs/remotes/origin/tags/...`
- Contrast: native Git clone stores tags directly under:
- `refs/tags/...`
- Practical implication
- You’ll often want post-import cleanup if migrating permanently (covered later)
#### Committing back to Subversion
- Local Git commit
- Example: `git commit -am 'Adding git-svn instructions to the README'`
- Publish to SVN
- `git svn dcommit`
- What `dcommit` does (key behavior)
- Takes each local commit atop SVN’s tip and commits it to SVN one-by-one
- Rewrites your local Git commits after publishing
- Adds a `git-svn-id` line to each commit message
- Changes SHA-1s for the commits (history rewritten locally)
- Consequence: “SVN first” if dual-publishing
- If you must push to both SVN and a Git server:
- `dcommit` to SVN first, then push to Git
- Because `dcommit` changes commit data
#### Pulling in new changes (keeping in sync with SVN)
- Symptom: `dcommit` rejected because SVN has advanced
- Error example: “Transaction is out of date”
- Resolution: rebase against SVN
- `git svn rebase`
- Fetches changes from SVN you don’t have yet
- Rebases your local commits on top of updated SVN tip
- May involve conflict resolution
- After rebase
- `git svn dcommit` should succeed
- Behavior difference vs Git server
- Git requires integrating upstream before push (always)
- `git svn` makes you integrate only when conflicts occur (SVN-like)
- Non-conflicting edits in different files may still allow `dcommit`
- But `git svn` may still perform a rebase internally
- Critical caveat: published state may be “untestable” locally
- Because SVN accepts sequential commits without requiring a full pre-tested merged state
- Resulting repo state may not have existed on any client machine
- Can yield subtle incompatibilities
- Keeping updated routinely
- Prefer `git svn rebase` periodically
- Does fetch + updates your branch
- Working directory must be clean
- Stash or temporarily commit local changes before rebasing
#### Git branching issues when SVN is the server
- Git encourages topic branches + merges
- With `git svn`, prefer rebasing topic work onto mainline
- Why
- SVN has linear history and doesn’t model merges like Git
- `git svn` conversion follows only the first parent when turning Git history into SVN commits
- If you `dcommit` a merged history
- `dcommit` will succeed, but…
- Only the merge commit gets rewritten; the original topic-branch commits won’t appear individually in SVN history
- Others cloning will see a “squashed” result
- Similar to `git merge --squash`
- Lose detailed commit provenance/timing from topic branch
#### Subversion branching with `git svn`
##### Creating a new SVN branch
- Command: `git svn branch <new-branch>`
- Example: `git svn branch opera`
- What it does
- Equivalent to `svn copy trunk branches/opera`
- Operates on the SVN server
- Common gotcha
- It does NOT switch your working directory to the new branch
- If you commit now, you still commit to SVN trunk (not the new branch)
##### Switching active branches / targeting `dcommit`
- How `dcommit` decides where to commit
- Looks for tip of an SVN branch (git-svn-id) in your history
- Assumption: there should be only one, and it should be the last git-svn-id in your current branch history
- Working on multiple SVN branches simultaneously (Git-side strategy)
- Create local Git branches rooted at the corresponding imported SVN refs
- Example:
- `git branch opera remotes/origin/opera`
##### Merging SVN branches using Git
- You can merge locally with `git merge`
- Example: merge `opera` into trunk (master)
- Provide a meaningful merge commit message
- Use `-m` to avoid generic “Merge branch opera”
- After `dcommit`
- SVN can’t store true merge-parent info
- `dcommit` will squash merge history into a single commit in SVN
- Merge ancestry info is erased → future merge-base calculations in Git become wrong
- Practical workaround / best practice
- After merging a feature branch into trunk and `dcommit`ing:
- delete the local feature branch (e.g., `opera`)
- avoids later incorrect merges / confusion
#### SVN-like helper commands provided by `git svn`
##### SVN-style history
- Command: `git svn log`
- Properties
- Runs offline (unlike `svn log` which queries server)
- Shows only commits that have been committed to SVN (dcommitted)
- Does not show:
- local Git-only commits (not yet dcommitted)
- new SVN commits created since last communication
- Best thought of as “last known SVN commit state”
##### SVN annotation / blame
- Command: `git svn blame <file>`
- Equivalent to `svn annotate`
- Same limitations as `git svn log`
- Offline
- Only includes commits known as of last SVN interaction
##### SVN server information
- Command: `git svn info`
- Equivalent to `svn info`
- Offline + last-known-state behavior
##### Ignoring what SVN ignores
- Problem
- SVN ignores are often stored as `svn:ignore` properties
- Git users want equivalent ignore behavior to avoid accidentally committing ignored files
- Tools
- `git svn create-ignore`
- Creates corresponding `.gitignore` files in working tree
- Intended to be committed on next commit (if desired)
- `git svn show-ignore`
- Prints ignore rules (stdout)
- Useful to keep ignores local-only:
- `git svn show-ignore > .git/info/exclude`
- Avoids committing `.gitignore` files
- Useful if you’re the only Git user and teammates don’t want `.gitignore` artifacts in SVN repo
#### Git–SVN summary (what to remember)
- `git svn` is valuable when SVN server is unavoidable
- Treat it as “crippled Git”
- Many Git workflows don’t translate cleanly to SVN’s linear model
- Safe-operating guidelines (to avoid confusing SVN / teammates)
- Keep a linear Git history; avoid merge commits
- Rebase topic work onto mainline; don’t merge it
- Don’t collaborate using a parallel Git server
- If you use a Git server for faster clones:
- don’t push commits lacking `git-svn-id`
- consider a pre-receive hook to reject commits without `git-svn-id`
- If possible: migrate to a real Git server for full benefits
### Git and Mercurial (Hg) — `git-remote-hg`
#### Context
- DVCS ecosystem includes Git + others; Mercurial is most popular non-Git DVCS
- Git and Mercurial are conceptually similar → interoperability is relatively smooth
#### Bridge overview: remote helper `git-remote-hg`
- Project: https://github.com/felipec/git-remote-hg
- Implemented as a Git “remote helper”
- Same general mechanism used by Git’s HTTP/S remote support
- Benefit
- Use standard Git commands (`clone`, `fetch`, `push`) against an Hg-backed remote
#### Installation checklist
- Install helper script into PATH
- `curl -o ~/bin/git-remote-hg https://raw.githubusercontent.com/felipec/git-remote-hg/master/git-remote-hg`
- `chmod +x ~/bin/git-remote-hg`
- Python dependency
- Mercurial library for Python:
- `pip install mercurial`
- If Python not installed: install from https://www.python.org/
- Mercurial client
- Install from https://www.mercurial-scm.org/
#### Getting started (example repository)
- Prepare Mercurial “server-side” repo (any Hg repo can be pushed to)
- Example: hello world repo
- `hg clone http://selenic.com/repo/hello /tmp/hello`
- Clone using Git (Hg remote helper prefix)
- `git clone hg::/tmp/hello /tmp/hello-git`
- Verify history
- `git log --oneline --graph --decorate`
- You may see many refs displayed; helper creates multiple refs to represent Hg concepts
#### Under-the-hood mapping (how Git refs represent Hg concepts)
- Inspect actual refs on disk
- `tree .git/refs`
- Key internal namespaces created by helper
- `refs/hg/...`
- Holds the “real” remote refs managed by helper
- Separates:
- Mercurial branches (e.g., `refs/hg/origin/branches/default`)
- Mercurial bookmarks (e.g., `refs/hg/origin/bookmarks/master`)
- `refs/notes/hg` (or `.git/notes/hg`)
- Stores mapping between Git commit hashes and Mercurial changeset IDs
- Implemented using Git notes (tree of mappings)
- Concept
- Key: Git commit SHA-1
- Value: Mercurial changeset ID
- Practical takeaway
- Most users can ignore these implementation details during normal workflows
#### Ignoring files (Hg ↔ Git)
- Goal
- Respect Mercurial ignore rules locally without committing `.gitignore` to an Hg project
- Approach
- Copy Hg ignore file into Git’s local-only exclude file
- `cp .hgignore .git/info/exclude`
- Why it works
- `.git/info/exclude` behaves like `.gitignore` but is not committed
- Hg ignore format is compatible enough for this simple copy in the example
#### Typical workflow (clone → commit → fetch/merge → push)
- Local work and commits on `master`
- Example log: local commits ahead of `origin/master`
- Check for remote changes
- `git fetch`
- May advance `origin/master` (from Hg changes made by others)
- Handle divergence
- Mercurial supports merges, so you can do a normal Git merge:
- `git merge origin/master`
- Share work
- `git push`
- Verify on Mercurial side
- `hg log -G --style compact`
- Result
- Hg changesets created from Git commits appear in Hg history (including merges)
#### Branches and bookmarks (concept mapping and operations)
- Conceptual differences
- Git: one kind of branch (moving ref)
- Mercurial: two related concepts
- Bookmark: moving pointer (like Git branch)
- Branch (heavyweight): branch name stored in each changeset; permanently part of history
- Why helper must care
- Git can represent both with refs, but Mercurial’s semantics differ
##### Creating Mercurial bookmarks via Git branches
- Git side
- `git checkout -b featureA`
- `git push origin featureA`
- Mercurial side
- `hg bookmarks` shows bookmark `featureA`
- Hg log shows `[featureA]` annotation on appropriate revision
- Limitation
- Bookmark deletion not supported from Git side (remote helper limitation)
##### Working with Mercurial heavyweight branches via Git
- Create branch in Git under the `branches/` namespace
- `git checkout -b branches/permanent`
- commit changes
- `git push origin branches/permanent`
- Mercurial side
- `hg branches` shows `permanent` with tip changeset
- `hg log -G` shows:
- `branch: permanent` recorded in the changeset itself
##### History rewriting warning (Hg is append-only)
- Mercurial generally does not support rewriting published history; it adds new changesets instead
- If you do interactive rebase + force-push from Git
- New changesets are created
- Old changesets remain in repo history
- Risk
- Can be very confusing to Mercurial users
- Guidance
- Avoid rewriting history that has left your machine
#### Mercurial summary
- Working across Git/Hg boundary is typically low-friction
- If you avoid rewriting shared history, you may barely notice the remote is Mercurial
### Git and Bazaar (bzr) — `git-remote-bzr`
#### Context
- Bazaar (GNU Project) is a DVCS but behaves differently from Git
- Different keywords for similar operations
- Some common Git terms differ in meaning
- Branch management is notably different → potential confusion for Git users
- Still possible to work on Bazaar repos from Git with a remote helper
#### Bridge overview: remote helper `git-remote-bzr`
- Project: https://github.com/felipec/git-remote-bzr
- Enables `git clone`/`fetch`/`push` against Bazaar repositories
#### Installation checklist
- Install helper script into PATH
- `wget https://raw.github.com/felipec/git-remote-bzr/master/git-remote-bzr -O ~/bin/git-remote-bzr`
- `chmod +x ~/bin/git-remote-bzr`
- Install Bazaar client (`bzr`)
#### Creating a Git repository from a Bazaar repository
- Clone using `bzr::` prefix
- Recommendation
- Don’t attach Git clone to a *local* Bazaar clone
- even though both are full clones
- Prefer attaching Git clone directly to the *central* Bazaar repository
- Example
- Remote: `bzr+ssh://developer@mybazaarserver:myproject`
- Git clone:
- `git clone bzr::bzr+ssh://developer@mybazaarserver:myproject myProject-Git`
- `cd myProject-Git`
- Post-clone optimization (disk compaction)
- `git gc --aggressive`
- Especially helpful for big repositories
#### Bazaar branches and cloning behavior
- Bazaar allows cloning branches; a repository may contain multiple branches
- `git-remote-bzr` can clone:
- A specific branch
- `git clone bzr::bzr://bzr.savannah.gnu.org/emacs/trunk emacs-trunk`
- All branches in a repository
- `git clone bzr::bzr://bzr.savannah.gnu.org/emacs emacs`
- Fetch only selected branches
- Configure:
- `git config remote-bzr.branches 'trunk, xwindow'`
- When remote repo does not allow listing branches
- Manually specify branch list and fetch
- `git init emacs`
- `git remote add origin bzr::bzr://bzr.savannah.gnu.org/emacs`
- `git config remote-bzr.branches 'trunk, xwindow'`
- `git fetch`
#### Ignoring files (Bazaar `.bzrignore` ↔ Git ignores)
- Core concern
- You shouldn’t create/commit `.gitignore` into a Bazaar-managed project
- Could disturb Bazaar users
- Solution
- Use `.git/info/exclude` (local-only ignores)
- Implement as:
- symbolic link to `.bzrignore`, or
- regular file that mirrors `.bzrignore`
- Bazaar ignore features beyond Git
- `!!` prefix
- ignore patterns even if re-included by a later `!` rule
- `RE:` prefix
- Python regular expression pattern (Git supports only glob patterns)
- Two cases
- Case A: `.bzrignore` has no `!!` and no `RE:` lines
- Safe to symlink:
- `ln -s .bzrignore .git/info/exclude`
- Case B: `.bzrignore` contains `!!` and/or `RE:`
- Must create/edit `.git/info/exclude` manually to match ignore behavior
- Ongoing maintenance warning
- Must monitor changes to `.bzrignore`
- If `.bzrignore` changes to include unsupported syntax:
- remove symlink (if used)
- copy `.bzrignore` into `.git/info/exclude`
- adapt patterns
- Git exclusion caveat
- In Git, if a parent directory is excluded, you cannot later re-include a file inside it
- Be careful translating Bazaar ignore semantics
#### Fetching from Bazaar remote (Git-side)
- Use normal Git commands
- Example (if working on `master`)
- `git pull --rebase origin`
- Merge/rebase your work onto `origin/master`
#### Pushing to Bazaar remote (Git-side)
- Bazaar supports merge commits
- Pushing merge commits is acceptable
- Typical flow
- work on branches
- merge into `master`
- push:
- `git push origin master`
#### Caveats (remote-helper limitations)
- Some push operations aren’t supported / behave unexpectedly
- Branch deletion:
- `git push origin :branch-to-delete` (doesn’t work)
- Refspec rename:
- `git push origin old:new` (pushes `old`)
- Dry-run:
- `git push --dry-run origin branch` (will push anyway)
#### Bazaar summary
- Bazaar and Git are similar enough for reasonable interoperability
- Key to success
- Know the remote isn’t native Git
- Respect remote-helper limitations
### Git and Perforce
#### Context
- Perforce (1995) — oldest VCS covered in chapter
- Designed for constraints of its era
- Central server, always connected assumption
- Only one version stored locally
- Still widely used in corporate settings
- Two ways to mix Git with Perforce
- Git Fusion (server-side)
- git-p4 (client-side)
#### Option 1: Perforce Git Fusion (server-side bridge)
##### Overview
- Product by Perforce: Git Fusion
- http://www.perforce.com/git-fusion
- Synchronizes Perforce server with Git repositories on server side
- Exposes Perforce depot subtrees as read-write Git repos
##### Setting up Git Fusion (example: Perforce-provided VM)
- Installation method used in chapter
- Download virtual machine image with Perforce daemon + Git Fusion
- http://www.perforce.com/downloads/Perforce/20-User
- Import into virtualization software (VirtualBox in example)
- First boot configuration prompts
- Set passwords for Linux users:
- `root`, `perforce`, `git`
- Provide instance name (distinguish installations on same network)
- Note VM IP address (needed for cloning over HTTPS)
- Create a Perforce user (as root on VM)
- `p4 -p localhost:1666 -u super user -f john`
- Opens editor (VI); accept defaults with `:wq`
- `p4 -p localhost:1666 -u john passwd`
- Enter password twice
- `exit`
- SSL certificate workaround for example
- VM certificate doesn’t match IP → Git rejects HTTPS
- Temporary bypass:
- `export GIT_SSL_NO_VERIFY=true`
- For real installs: install correct certificate per Git Fusion manual
- Test clone of sample repo (Talkhouse)
- `git clone https://<IP>/Talkhouse`
- Prompts for credentials (john)
- Credential cache helps subsequent commands
- Figure reference
- Figure 145: Git Fusion virtual machine boot screen (shows IP)
##### Git Fusion configuration (via Perforce client)
- Configuration lives in Perforce depot path
- `//.git-fusion` directory
- Map `//.git-fusion` into a Perforce workspace and browse/edit
- Directory structure (high level)
- `objects/`
- `repos/` and `trees/` (internal object mapping; usually don’t edit)
- global `p4gf_config`
- per-repo config: `repos/<RepoName>/p4gf_config`
- user mapping: `users/p4gf_usermap`
- Global `p4gf_config` characteristics
- INI-style text file
- Global defaults; can be overridden by repo-specific configs
- Key sections shown (examples)
- `[repo-creation] charset = utf8`
- `[git-to-perforce]`
- `change-owner = author`
- `enable-git-branch-creation = yes`
- `enable-swarm-reviews = yes`
- `enable-git-merge-commits = yes`
- `enable-git-submodules = yes`
- `preflight-commit = none`
- `ignore-author-permissions = no`
- `read-permission-check = none`
- `git-merge-avoidance-after-change-num = 12107`
- `[perforce-to-git]` (`http-url`, `ssh-url`)
- `[@features]` feature flags (imports, chunked-push, matrix2, parallel-push)
- `[authentication] email-case-sensitivity = no`
- Repo-specific `p4gf_config`
- Contains `[@repo]` section with per-repo overrides
- Contains Perforce-branch ↔ Git-branch mappings via named sections
##### Branch mapping and view mappings (Git Fusion)
- Mapping section example
- `[Talkhouse-master]`
- `git-branch-name = master`
- `view = //depot/Talkhouse/main-dev/... ...`
- Purpose of settings
- `git-branch-name`
- Choose friendlier Git branch names (avoid awkward Perforce paths)
- `view`
- Defines how Perforce files map into the Git repository
- Uses standard Perforce view mapping syntax
- Multi-project mapping example
- One Git branch can combine multiple Perforce depots/subtrees into subdirectories
- Example view:
- `//depot/project1/main/... project1/...`
- `//depot/project2/mainline/... project2/...`
##### User identity mapping (Git Fusion: `users/p4gf_usermap`)
- Purpose
- Map Perforce users to Git author identities (and vice versa)
- Default mapping behavior (without usermap)
- Perforce → Git
- Look up Perforce user; use stored full name + email in Git commit
- Git → Perforce
- Look up Perforce user by email in Git commit author field
- Submit changeset as that Perforce user (permissions apply)
- Mapping file line format
- `<user> <email> "<full name>"`
- Use cases
- Multiple emails mapping to one Perforce account
- Supports commits authored under different emails but attributed to same Perforce user
- Anonymization / masking internal directory
- Replace real names/emails with fictional/anonymous ones in exported Git commits
- Matching behavior detail
- When creating Git commit from Perforce changeset:
- first matching line for Perforce user supplies Git author info
- Uniqueness recommendation
- email + full name should be unique unless intentionally collapsing attribution
##### Workflow with Git Fusion (from the Git side)
- Clone a Git Fusion repository (example Jam)
- `git clone https://<IP>/Jam`
- Initial clone behavior
- Git Fusion converts applicable Perforce changesets → Git commits on server
- Takes time proportional to history size
- Later fetches are incremental and feel more native-speed
- Result feels like a normal Git repo
- Typical refs:
- `master`
- `origin/master`, `origin/rel2.1`, etc.
- Standard Git workflow applies
- Make commits locally
- `git fetch` to update remote-tracking branches
- `git merge origin/master` to integrate updates
- `git push` to publish back
- Push mechanics (visible output)
- Git Fusion runs conversion back into Perforce:
- loads commit tree
- finds child commits
- runs `git fast-export`
- checks commits
- copies changelists
- submits new Git commit objects to Perforce
- Note: processing may continue even if connection closes
- Perforce-side visualization
- p4v revision graph shows merge structure akin to Git
- If Perforce lacks a named branch for Git-side commits
- Git Fusion creates an “anonymous” branch under `.git-fusion` to hold them
- Figure reference
- Figure 146: Perforce revision graph resulting from Git push
##### Git Fusion summary
- Advantages
- First-class interoperability when server admin can install it
- Supports many “full Git” features comfortably
- merge commits → recorded as Perforce integrations
- submodules (though may look odd to Perforce users)
- Limitations
- Will reject rewriting history that has already been pushed
- If Git Fusion not possible
- Use client-side `git-p4`
#### Option 2: `git-p4` (client-side Perforce bridge)
##### Overview
- Two-way bridge between Git and Perforce
- Runs entirely inside your Git repository
- No special Perforce server configuration required
- Less flexible/comprehensive than Git Fusion
- But “good enough” for many workflows
##### Prerequisites / notes
- Requires `p4` CLI tool in your PATH
- Free download (as referenced in chapter):
- http://www.perforce.com/downloads/Perforce/20-User
- Must set environment variables for Perforce connection (example)
- `export P4PORT=10.0.1.254:1666`
- `export P4USER=john`
##### Getting started: cloning from Perforce
- Command
- `git p4 clone //depot/www/live www-shallow`
- Result characteristics
- “Shallow” import by default
- imports only latest Perforce revision (`#head`)
- aligns with Perforce’s “not everyone has all history” model
- Git view after clone
- local `master`
- Perforce state refs:
- `p4/master`
- `p4/HEAD`
- Important nuance: no Git remotes created
- `git remote -v` → no remotes
- Perforce state is represented as refs, not a Git-managed remote
##### Workflow: sync, rebase, submit
- Local development
- commit locally on `master`
- Get latest from Perforce
- `git p4 sync`
- incremental import into `refs/remotes/p4/master`
- Keep history linear before submitting
- divergence between `master` and `p4/master` is possible
- recommended: rebase local commits on top of Perforce head
- shortcut:
- `git p4 rebase`
- effectively: `git p4 sync` + `git rebase p4/master`
- (with extra smarts for multi-branch situations)
- Submit work back to Perforce
- `git p4 submit`
- creates a Perforce changelist per Git commit between `p4/master` and `master`
- opens editor for each changelist specification
- imports Git commit message into Perforce change description
- includes diff content for context
- Authorship mismatch warning (during submit)
- If Git author email doesn’t match your Perforce account:
- message suggests:
- `--preserve-user` to modify authorship
- set `git-p4.skipUserNameCheck` to hide warning
- After submit completes
- git-p4 performs another incremental import
- rebases current branch onto `p4/master`
- effect resembles a `git push` workflow
- Commit rewriting
- Submitted commits’ SHA-1 hashes change
- git-p4 appends metadata line to commit message, e.g.:
- `[git-p4: depot-paths = "//depot/www/live/": change = 12144]`
- Squashing strategy
- To combine multiple Git commits into one Perforce changeset:
- interactive rebase (squash) before `git p4 submit`
##### What about merge commits?
- Perforce branching model differs; merge commits aren’t meaningful in Perforce changelist history
- `git p4 submit` behavior with merge commits
- ignores merge commits
- applies only the non-merge commits that aren’t in Perforce yet
- Net effect
- history becomes linear on submission (as though you rebased)
- Practical implication
- You can branch and merge freely in Git locally
- As long as you can rebase/linearize before submitting
- Caveat
- Perforce integration metadata (branch lineage) is not preserved; only file-level changes are recorded
##### Branching with `git-p4`
- Example Perforce depot layout
- `//depot/project/main`
- `//depot/project/dev`
- Example Perforce branch spec view
- `//depot/project/main/... //depot/project/dev/...`
- Clone with branch detection
- `git p4 clone --detect-branches //depot/project@all`
- `@all` imports all changesets that ever touched those paths (full history)
- imports additional branches (e.g., `project/dev`)
- updates branches list (e.g., `main dev`)
- When Perforce branch specs aren’t present
- Configure branch relationships manually
- `git init project`
- `git config git-p4.branchList main:dev`
- declares `main` and `dev`; `dev` is child of `main`
- `git clone --detect-branches //depot/project@all .`
- Working with detected branches
- Create local branch from Perforce branch ref
- `git checkout -b dev p4/project/dev`
- `git p4 submit` targets correct Perforce branch automatically
- Limitations and operational constraints
- Cannot mix shallow clones with multiple branches
- For huge projects needing multiple submit targets
- may need one `git p4 clone` per branch to submit to
- Branch creation/integration must be done with Perforce tools
- git-p4 can only sync/submit to existing branches
- can only submit one linear changeset at a time
- merge/integration metadata is lost if merging in Git
##### Git + Perforce summary
- `git-p4` enables Git-style local workflow with Perforce as server-of-record
- Be careful about sharing Git commits
- Don’t push commits to shared Git remotes unless already submitted to Perforce
- If possible and approved by admin
- Git Fusion provides more seamless, first-class integration
## Part 2 — Migrating to Git (converting repositories into native Git)
### Why migrate
- Adopt Git as primary VCS for an existing codebase
- Goals
- Preserve history as much as possible
- Clean up author/branch/tag data during conversion
- Strategy
- Use system-specific importers when available
- Otherwise use `git fast-import` with a custom converter
### Migrating from Subversion (SVN)
#### Simple path (but imperfect)
- Use `git svn clone` to import
- Stop using SVN and push resulting Git repo to a new Git server
- Caveat
- Import can be imperfect; takes long anyway → worth doing a cleaner import
#### Author mapping (SVN usernames → Git identities)
- Problem
- SVN records commit “author” as a username on the SVN system
- Git prefers full identity: `Full Name <email>`
- Create `users.txt` mapping file
- Format:
- `svnuser = Full Name <email>`
- Example:
- `schacon = Scott Chacon <schacon@geemail.com>`
- `selse = Someo Nelse <selse@geemail.com>`
- Generate initial list of SVN author names
- `svn log --xml --quiet | grep author | sort -u | perl -pe 's/.*>(.*?)<.*/$1 = /'`
- Then redirect output into `users.txt` and fill in names/emails
- Windows note
- Migration steps may require special tooling; referenced guidance:
- https://docs.microsoft.com/en-us/azure/devops/repos/git/perform-migration-from-svn-to-git
#### Cleaner `git svn clone` for migration
- Recommended command pattern
- `git svn clone http://my-project.googlecode.com/svn/ \`
- `--authors-file=users.txt`
- `--no-metadata`
- `--prefix ""`
- `-s`
- `my_project`
- Option rationale
- `--authors-file`
- improves Author field quality in Git commits
- `--no-metadata`
- removes `git-svn-id` lines in commit messages (cleaner logs)
- WARNING: keep metadata if you intend to mirror back to original SVN repo
- `--prefix ""`
- avoids extra ref prefixes from import
- `-s`
- assumes standard SVN trunk/branches/tags layout
#### Post-import cleanup (make imported refs idiomatic Git)
- Convert SVN tags (remote refs) into real Git tags
- Problem
- `git svn` stores tags as remote refs under `refs/remotes/tags/...`
- Conversion loop (creates lightweight tags and deletes remote tag refs)
- `for t in $(git for-each-ref --format='%(refname:short)' refs/remotes/tags); do`
- `git tag ${t/tags\//} $t && git branch -D -r $t;`
- `done`
- Convert remaining remote refs into local branches
- `for b in $(git for-each-ref --format='%(refname:short)' refs/remotes); do`
- `git branch $b refs/remotes/$b && git branch -D -r $b;`
- `done`
- Remove peg-revision branches (optional cleanup)
- Symptom
- extra branches suffixed with `@<number>` (SVN “peg-revisions”)
- If you don’t need them:
- `for p in $(git for-each-ref --format='%(refname:short)' | grep @); do`
- `git branch -D $p;`
- `done`
- Remove redundant `trunk` branch
- `git svn` often creates `trunk` ref that points where `master` points
- Remove:
- `git branch -d trunk`
#### Push migrated repo to Git server
- Add remote
- `git remote add origin git@my-git-server:myrepository.git`
- Push all branches
- `git push origin --all`
- Push tags
- `git push origin --tags`
### Migrating from Mercurial (Hg)
#### Why it’s straightforward
- Git and Mercurial data models are similar
- Git is flexible in representing refs/tags
#### Tool: `hg-fast-export`
- Acquire tool
- `git clone https://github.com/frej/fast-export.git`
#### Steps
- Full clone the Mercurial repo to convert
- `hg clone <remote repo URL> /tmp/hg-repo`
- Create author mapping file (optional cleanup but often necessary)
- Generate list:
- `cd /tmp/hg-repo`
- `hg log | grep user: | sort | uniq | sed 's/user: *//' > ../authors`
- Convert each line into rule syntax:
- `"<input>"="<output>"`
- Notes
- Mercurial allows looser author strings than Git
- Mapping file can normalize duplicates, fix invalid formats
- Supports Python `string_escape` sequences in mapping strings
- Unmatched inputs pass through unchanged
- Also usable to rename branches/tags if Mercurial names invalid in Git
- Branch mapping: `-B`
- Tag mapping: `-T`
- Create a new Git repository and run export
- `git init /tmp/converted`
- `cd /tmp/converted`
- `/tmp/fast-export/hg-fast-export.sh -r /tmp/hg-repo -A /tmp/authors`
- Output expectations
- Exporter reports per-revision progress and file deltas
- Mercurial tags exported to Git tags
- Mercurial branches/bookmarks become Git branches
- Ends with `git-fast-import` statistics
- Validate author consolidation
- `git shortlog -sn`
- Publish to Git server
- `git remote add origin git@my-git-server:myrepository.git`
- `git push origin --all`
### Migrating from Bazaar (bzr)
#### Tooling: Bazaar fast-export → Git fast-import
- Requires `bzr-fastimport` plugin (and Python module dependencies)
#### Install `bzr-fastimport`
- Linux/Unix-like (preferred: package manager)
- Debian/Ubuntu:
- `sudo apt-get install bzr-fastimport`
- RHEL:
- `sudo yum install bzr-fastimport`
- Fedora 22+:
- `sudo dnf install bzr-fastimport`
- If package unavailable: install plugin manually
- `mkdir --parents ~/.bazaar/plugins`
- `cd ~/.bazaar/plugins`
- `bzr branch lp:bzr-fastimport fastimport`
- `cd fastimport`
- `sudo python setup.py install --record=files.txt`
- Ensure Python module `fastimport` is present
- Check:
- `python -c "import fastimport"`
- If missing:
- `pip install fastimport`
- Source:
- https://pypi.python.org/pypi/fastimport/
- Windows
- Standalone/default Bazaar install includes `bzr-fastimport` (no extra steps)
#### Import scenarios
##### Single-branch Bazaar project
- `cd /path/to/the/bzr/repository`
- Initialize Git
- `git init`
- Export + import
- `bzr fast-export --plain . | git fast-import`
- Expected time
- seconds to minutes depending on repo size
##### Bazaar repository with multiple branches (main + working branch)
- Example branch directories
- `myProject.trunk` (main)
- `myProject.work` (working branch)
- Create Git repo
- `git init git-repo`
- `cd git-repo`
- Import trunk as Git master (with marks)
- `bzr fast-export --export-marks=../marks.bzr ../myProject.trunk | \`
- `git fast-import --export-marks=../marks.git`
- Import work branch as Git branch `work` (reusing marks)
- `bzr fast-export --marks=../marks.bzr --git-branch=work ../myProject.work | \`
- `git fast-import --import-marks=../marks.git --export-marks=../marks.git`
- Verify
- `git branch` should show `master` and `work`
- Inspect logs
- Remove mark files (`marks.bzr`, `marks.git`) after confirmation
#### Synchronize working directory + index after import
- Issue
- staging area may not match HEAD
- working directory may not match HEAD after multi-branch import
- Fix
- `git reset --hard HEAD`
#### Convert ignore rules (.bzrignore → .gitignore)
- Rename ignore file
- `git mv .bzrignore .gitignore`
- If `.bzrignore` uses Bazaar-only constructs (`!!`, `RE:`)
- modify `.gitignore` (possibly multiple `.gitignore` files) to match behavior
- Commit this conversion as part of migration
- `git commit -am 'Migration from Bazaar to Git'`
#### Publish to Git server
- `git remote add origin git@my-git-server:mygitrepository.git`
- `git push origin --all`
- `git push origin --tags`
### Migrating from Perforce
#### Approach A: Perforce Git Fusion
- Configure project, branches, and user mappings in Git Fusion
- Clone Git Fusion repo (appears native Git)
- Push to a native Git host if desired
- Optionally, Perforce (via Git Fusion) can continue to host Git repos
#### Approach B: `git-p4` as an import tool
- Example: import Jam from Perforce Public Depot
- Set Perforce server
- `export P4PORT=public.perforce.com:1666`
- Import full history of subtree (`@all`)
- `git-p4 clone //guest/perforce_software/jam@all p4import`
- Branches
- Use `--detect-branches` if you want multiple branches (when available/configured)
- Inspect imported history
- `git log`
- Commits include Perforce change marker line:
- `[git-p4: depot-paths = "...": change = N]`
- Optional cleanup: remove git-p4 marker lines (do this before new work)
- `git filter-branch --msg-filter 'sed -e "/^\[git-p4:/d"'`
- Effect
- rewrites commit history; SHA-1 hashes change
- Publish to new Git server (after cleanup/verification)
### A custom importer (when no prebuilt tool exists) — `git fast-import`
#### When to use
- No quality importer exists for your legacy VCS or storage format
- You need customized mapping/cleanup beyond available tools
#### Why `git fast-import`
- Accepts a simple, line-oriented instruction stream on stdin
- Efficiently creates Git objects (blobs/trees/commits/refs/tags)
- Much easier than
- invoking raw/plumbing commands per object, or
- writing raw Git objects directly
#### Example data source: timestamped directory backups
- Source directory structure
- `back_YYYY_MM_DD/` (snapshots)
- `current/` (latest snapshot)
- Goal
- Import each snapshot as a commit in a linear history
- Each commit represents full tree state at that snapshot
#### Git storage reminder (mapping problem to solution)
- Git history is a linked list (DAG) of commit objects
- Each commit points to a snapshot (tree)
- So importer must emit
- tree content for each snapshot
- commit metadata + parent linkage
- order of commits
#### Strategy for the example importer
- Walk snapshot directories in order
- For each snapshot:
- create a new commit
- link it to previous commit (parent)
- wipe tree (`deleteall`) and re-add all files (full snapshot approach)
- Notes
- fast-import also supports delta-style imports (add/modify/delete only), but that’s more complex
#### Ruby implementation (key pieces)
- Language choice
- Ruby used for readability and convenience
- Any language works if it can output proper fast-import stream
- Windows newline caution
- `git fast-import` expects LF (not CRLF)
- Ruby fix:
- `$stdout.binmode`
##### Main loop (iterate snapshots)
- Pseudocode shape
- `last_mark = nil`
- `Dir.chdir(ARGV[0]) do`
- `Dir.glob("*").each do |dir|`
- `next if File.file?(dir)`
- `Dir.chdir(dir) do`
- `last_mark = print_export(dir, last_mark)`
- `end`
- `end`
- `end`
##### Marks (fast-import commit identifiers)
- Definition
- “mark” is an integer ID used to reference commits within fast-import stream
- Implementation: map directory names to sequential integers
- Global: `$marks = []`
- `convert_dir_to_mark(dir)`
- add dir to `$marks` if not already present
- return `($marks.index(dir) + 1).to_s`
##### Dates (commit timestamps from directory names)
- Need integer timestamp for committer line
- `convert_dir_to_date(dir)`
- if `dir == 'current'` → `Time.now().to_i`
- else
- strip prefix `back_`
- parse `year, month, day`
- use `Time.local(year, month, day).to_i`
##### Author/committer identity
- Hardcoded for example
- `$author = 'John Doe <john@example.com>'`
##### Fast-import commit record structure (what gets printed)
- For each snapshot commit:
- `commit refs/heads/master`
- `mark :<mark>`
- `committer <author> <timestamp> -0700`
- timezone hardcoded as `-0700` in example
- commit message via `data` directive:
- `"imported from <dir>"`
- parent link (except first commit):
- `from :<last_mark>`
- tree content:
- `deleteall`
- for each file: `M <mode> inline <path>` + inline `data` (file content)
##### Helper: exporting data blocks (`data <size>\n<content>`)
- Used for both
- commit messages
- file contents
- `export_data(string)`
- prints:
- `data #{string.size}\n#{string}`
##### Helper: writing a file blob inline
- `inline_data(file, code = 'M', mode = '644')`
- `content = File.read(file)`
- `puts "#{code} #{mode} inline #{file}"`
- `export_data(content)`
- Mode notes
- `644` for normal files
- must detect executables and use `755` when needed
##### `print_export(dir, last_mark)` responsibilities
- Compute metadata
- `date = convert_dir_to_date(dir)`
- `mark = convert_dir_to_mark(dir)`
- Print commit header + metadata + message
- Print parent link if present
- Print `deleteall`
- Walk all files in snapshot
- `Dir.glob("**/*")`
- `next if !File.file?(file)`
- `inline_data(file)`
- Return `mark` to become next iteration’s `last_mark`
##### Full script structure (as presented)
- Shebang
- `#!/usr/bin/env ruby`
- Windows newline fix
- `$stdout.binmode`
- Globals
- `$author = "John Doe <john@example.com>"`
- `$marks = []`
- Functions
- `convert_dir_to_mark`
- `convert_dir_to_date`
- `export_data`
- `inline_data`
- `print_export`
- Main loop (iterates snapshot directories, updating `last_mark`)
#### Running the importer
- Create target Git repo
- `git init`
- Pipe importer output into `git fast-import`
- `ruby import.rb /opt/import_from | git fast-import`
- Successful run yields
- `git-fast-import statistics` summary (objects, branches, marks, memory, etc.)
- Verify commit history
- `git log`
- Working tree behavior
- After import, nothing is checked out by default
- Populate working directory:
- `git reset --hard master`
#### Extending beyond the example
- `git fast-import` can handle
- file mode changes (e.g., executable bits)
- binary data
- multiple branches
- merges
- tags
- progress indicators
- Reference
- examples in Git source: `contrib/fast-import/`
## Chapter wrap-up (Summary)
- You can use Git effectively even when the central system is not Git
- via bridges/remote helpers (`git svn`, `git-remote-hg`, `git-remote-bzr`, Git Fusion, `git-p4`)
- You can migrate repositories from common VCS into native Git
- SVN, Mercurial, Bazaar, Perforce
- plus custom sources via `git fast-import`
- Next step (as hinted in chapter)
- understanding Git internals enables even more precise control over repository data