```markmap # Git and Other Systems (Chapter 7) ## Chapter purpose / big picture - Reality check: you can't always switch every project to Git immediately - Two major goals - Use Git locally while the “official” repository lives in another VCS (Git as a client) - Migrate/convert an existing repository from another VCS into Git (Migrating to Git) - Key idea: “bridges/adapters” let Git interoperate with centralized or other DVCS systems - Recurring caveat theme throughout - Different VCS have different data models (linear history vs merge history, tags/branches semantics, etc.) - Bridges often require constraints (e.g., keep history linear, avoid rewriting) ## Part 1 — Git as a Client (working with non-Git servers) ### What “bridges” enable - Keep Git’s local UX (branching, merging, staging, rebase, cherry-pick, etc.) - Collaborators can keep using their existing VCS server + client tools - Often useful as an incremental adoption path (“sneak Git in”) ### Git and Subversion (SVN) — `git svn` #### Background: why SVN matters - Widely used in open source + corporate environments - Longstanding “default” centralized VCS for many projects - Similar lineage to CVS - SVN constraints that influence workflows - Centralized, linear, single “official” history - Merges recorded differently than in Git (and often more limited) #### Bridge overview: `git svn` - Bidirectional bridge to an SVN server - Lets you - Work locally with Git features (branches, merges, staging, rebase, cherry-pick) - Publish work back to SVN as if using SVN client - Practical role - Helps teams gain Git productivity without server migration - Often a stepping stone (“gateway drug” to DVCS) #### Mental model + rules of thumb (critical differences from pure Git) - You are interacting with Subversion, not Git - Best practices to avoid confusion - Keep history as linear as possible - Prefer rebasing over merging - Avoid merge commits in publishable history - Avoid simultaneously collaborating via a Git remote repository - Don’t push to a parallel Git server and SVN at the same time - Don’t rewrite history after publishing to SVN, then try to push again - Team coordination guideline - If some devs use SVN clients and others use `git svn`, everyone should collaborate via the SVN server (single source of truth) #### `git svn` command family (entry point) - Base command: `git svn` - Provides many subcommands - Common ones shown through workflows #### Setting up an SVN repo for examples (local writable mirror) - Need an SVN repository with write access - Tool used: `svnsync` (ships with Subversion) - Create a new local SVN repository - `mkdir /tmp/test-svn` - `svnadmin create /tmp/test-svn` - Enable changing revprops (revision properties) - Add hook: `/tmp/test-svn/hooks/pre-revprop-change` - Content: - `#!/bin/sh` - `exit 0;` - Make executable: `chmod +x /tmp/test-svn/hooks/pre-revprop-change` - Initialize sync metadata - `svnsync init file:///tmp/test-svn http://your-svn-server.example.org/svn/` - Sync revisions into the local mirror - `svnsync sync file:///tmp/test-svn` - Notes - Copies one revision at a time - Very inefficient (but simplest approach) - Remote-to-remote sync can take a long time even for smallish histories #### Getting started: importing SVN into a Git repo - Clone/import SVN repository - Full layout options: - `git svn clone file:///tmp/test-svn -T trunk -b branches -t tags` - Standard layout shorthand: - `git svn clone file:///tmp/test-svn -s` - What this does under the hood - Equivalent to: - `git svn init` then `git svn fetch` - Performance note - Git must check out each SVN revision sequentially and commit it - 100s/1000s of commits can take hours or days - Layout flags meaning - `-T trunk` → trunk directory name - `-b branches` → branches directory name - `-t tags` → tags directory name - `-s` → “standard layout” (implies all of the above) - Customize if SVN repo uses nonstandard paths #### Resulting refs: branches/tags as seen in Git - Inspect imported refs - `git branch -a` - `git show-ref` - Important nuance: SVN tags handled as remote refs - `git svn` imports SVN tags as remote refs under: - `refs/remotes/origin/tags/...` - Contrast: native Git clone stores tags directly under: - `refs/tags/...` - Practical implication - You’ll often want post-import cleanup if migrating permanently (covered later) #### Committing back to Subversion - Local Git commit - Example: `git commit -am 'Adding git-svn instructions to the README'` - Publish to SVN - `git svn dcommit` - What `dcommit` does (key behavior) - Takes each local commit atop SVN’s tip and commits it to SVN one-by-one - Rewrites your local Git commits after publishing - Adds a `git-svn-id` line to each commit message - Changes SHA-1s for the commits (history rewritten locally) - Consequence: “SVN first” if dual-publishing - If you must push to both SVN and a Git server: - `dcommit` to SVN first, then push to Git - Because `dcommit` changes commit data #### Pulling in new changes (keeping in sync with SVN) - Symptom: `dcommit` rejected because SVN has advanced - Error example: “Transaction is out of date” - Resolution: rebase against SVN - `git svn rebase` - Fetches changes from SVN you don’t have yet - Rebases your local commits on top of updated SVN tip - May involve conflict resolution - After rebase - `git svn dcommit` should succeed - Behavior difference vs Git server - Git requires integrating upstream before push (always) - `git svn` makes you integrate only when conflicts occur (SVN-like) - Non-conflicting edits in different files may still allow `dcommit` - But `git svn` may still perform a rebase internally - Critical caveat: published state may be “untestable” locally - Because SVN accepts sequential commits without requiring a full pre-tested merged state - Resulting repo state may not have existed on any client machine - Can yield subtle incompatibilities - Keeping updated routinely - Prefer `git svn rebase` periodically - Does fetch + updates your branch - Working directory must be clean - Stash or temporarily commit local changes before rebasing #### Git branching issues when SVN is the server - Git encourages topic branches + merges - With `git svn`, prefer rebasing topic work onto mainline - Why - SVN has linear history and doesn’t model merges like Git - `git svn` conversion follows only the first parent when turning Git history into SVN commits - If you `dcommit` a merged history - `dcommit` will succeed, but… - Only the merge commit gets rewritten; the original topic-branch commits won’t appear individually in SVN history - Others cloning will see a “squashed” result - Similar to `git merge --squash` - Lose detailed commit provenance/timing from topic branch #### Subversion branching with `git svn` ##### Creating a new SVN branch - Command: `git svn branch ` - Example: `git svn branch opera` - What it does - Equivalent to `svn copy trunk branches/opera` - Operates on the SVN server - Common gotcha - It does NOT switch your working directory to the new branch - If you commit now, you still commit to SVN trunk (not the new branch) ##### Switching active branches / targeting `dcommit` - How `dcommit` decides where to commit - Looks for tip of an SVN branch (git-svn-id) in your history - Assumption: there should be only one, and it should be the last git-svn-id in your current branch history - Working on multiple SVN branches simultaneously (Git-side strategy) - Create local Git branches rooted at the corresponding imported SVN refs - Example: - `git branch opera remotes/origin/opera` ##### Merging SVN branches using Git - You can merge locally with `git merge` - Example: merge `opera` into trunk (master) - Provide a meaningful merge commit message - Use `-m` to avoid generic “Merge branch opera” - After `dcommit` - SVN can’t store true merge-parent info - `dcommit` will squash merge history into a single commit in SVN - Merge ancestry info is erased → future merge-base calculations in Git become wrong - Practical workaround / best practice - After merging a feature branch into trunk and `dcommit`ing: - delete the local feature branch (e.g., `opera`) - avoids later incorrect merges / confusion #### SVN-like helper commands provided by `git svn` ##### SVN-style history - Command: `git svn log` - Properties - Runs offline (unlike `svn log` which queries server) - Shows only commits that have been committed to SVN (dcommitted) - Does not show: - local Git-only commits (not yet dcommitted) - new SVN commits created since last communication - Best thought of as “last known SVN commit state” ##### SVN annotation / blame - Command: `git svn blame ` - Equivalent to `svn annotate` - Same limitations as `git svn log` - Offline - Only includes commits known as of last SVN interaction ##### SVN server information - Command: `git svn info` - Equivalent to `svn info` - Offline + last-known-state behavior ##### Ignoring what SVN ignores - Problem - SVN ignores are often stored as `svn:ignore` properties - Git users want equivalent ignore behavior to avoid accidentally committing ignored files - Tools - `git svn create-ignore` - Creates corresponding `.gitignore` files in working tree - Intended to be committed on next commit (if desired) - `git svn show-ignore` - Prints ignore rules (stdout) - Useful to keep ignores local-only: - `git svn show-ignore > .git/info/exclude` - Avoids committing `.gitignore` files - Useful if you’re the only Git user and teammates don’t want `.gitignore` artifacts in SVN repo #### Git–SVN summary (what to remember) - `git svn` is valuable when SVN server is unavoidable - Treat it as “crippled Git” - Many Git workflows don’t translate cleanly to SVN’s linear model - Safe-operating guidelines (to avoid confusing SVN / teammates) - Keep a linear Git history; avoid merge commits - Rebase topic work onto mainline; don’t merge it - Don’t collaborate using a parallel Git server - If you use a Git server for faster clones: - don’t push commits lacking `git-svn-id` - consider a pre-receive hook to reject commits without `git-svn-id` - If possible: migrate to a real Git server for full benefits ### Git and Mercurial (Hg) — `git-remote-hg` #### Context - DVCS ecosystem includes Git + others; Mercurial is most popular non-Git DVCS - Git and Mercurial are conceptually similar → interoperability is relatively smooth #### Bridge overview: remote helper `git-remote-hg` - Project: https://github.com/felipec/git-remote-hg - Implemented as a Git “remote helper” - Same general mechanism used by Git’s HTTP/S remote support - Benefit - Use standard Git commands (`clone`, `fetch`, `push`) against an Hg-backed remote #### Installation checklist - Install helper script into PATH - `curl -o ~/bin/git-remote-hg https://raw.githubusercontent.com/felipec/git-remote-hg/master/git-remote-hg` - `chmod +x ~/bin/git-remote-hg` - Python dependency - Mercurial library for Python: - `pip install mercurial` - If Python not installed: install from https://www.python.org/ - Mercurial client - Install from https://www.mercurial-scm.org/ #### Getting started (example repository) - Prepare Mercurial “server-side” repo (any Hg repo can be pushed to) - Example: hello world repo - `hg clone http://selenic.com/repo/hello /tmp/hello` - Clone using Git (Hg remote helper prefix) - `git clone hg::/tmp/hello /tmp/hello-git` - Verify history - `git log --oneline --graph --decorate` - You may see many refs displayed; helper creates multiple refs to represent Hg concepts #### Under-the-hood mapping (how Git refs represent Hg concepts) - Inspect actual refs on disk - `tree .git/refs` - Key internal namespaces created by helper - `refs/hg/...` - Holds the “real” remote refs managed by helper - Separates: - Mercurial branches (e.g., `refs/hg/origin/branches/default`) - Mercurial bookmarks (e.g., `refs/hg/origin/bookmarks/master`) - `refs/notes/hg` (or `.git/notes/hg`) - Stores mapping between Git commit hashes and Mercurial changeset IDs - Implemented using Git notes (tree of mappings) - Concept - Key: Git commit SHA-1 - Value: Mercurial changeset ID - Practical takeaway - Most users can ignore these implementation details during normal workflows #### Ignoring files (Hg ↔ Git) - Goal - Respect Mercurial ignore rules locally without committing `.gitignore` to an Hg project - Approach - Copy Hg ignore file into Git’s local-only exclude file - `cp .hgignore .git/info/exclude` - Why it works - `.git/info/exclude` behaves like `.gitignore` but is not committed - Hg ignore format is compatible enough for this simple copy in the example #### Typical workflow (clone → commit → fetch/merge → push) - Local work and commits on `master` - Example log: local commits ahead of `origin/master` - Check for remote changes - `git fetch` - May advance `origin/master` (from Hg changes made by others) - Handle divergence - Mercurial supports merges, so you can do a normal Git merge: - `git merge origin/master` - Share work - `git push` - Verify on Mercurial side - `hg log -G --style compact` - Result - Hg changesets created from Git commits appear in Hg history (including merges) #### Branches and bookmarks (concept mapping and operations) - Conceptual differences - Git: one kind of branch (moving ref) - Mercurial: two related concepts - Bookmark: moving pointer (like Git branch) - Branch (heavyweight): branch name stored in each changeset; permanently part of history - Why helper must care - Git can represent both with refs, but Mercurial’s semantics differ ##### Creating Mercurial bookmarks via Git branches - Git side - `git checkout -b featureA` - `git push origin featureA` - Mercurial side - `hg bookmarks` shows bookmark `featureA` - Hg log shows `[featureA]` annotation on appropriate revision - Limitation - Bookmark deletion not supported from Git side (remote helper limitation) ##### Working with Mercurial heavyweight branches via Git - Create branch in Git under the `branches/` namespace - `git checkout -b branches/permanent` - commit changes - `git push origin branches/permanent` - Mercurial side - `hg branches` shows `permanent` with tip changeset - `hg log -G` shows: - `branch: permanent` recorded in the changeset itself ##### History rewriting warning (Hg is append-only) - Mercurial generally does not support rewriting published history; it adds new changesets instead - If you do interactive rebase + force-push from Git - New changesets are created - Old changesets remain in repo history - Risk - Can be very confusing to Mercurial users - Guidance - Avoid rewriting history that has left your machine #### Mercurial summary - Working across Git/Hg boundary is typically low-friction - If you avoid rewriting shared history, you may barely notice the remote is Mercurial ### Git and Bazaar (bzr) — `git-remote-bzr` #### Context - Bazaar (GNU Project) is a DVCS but behaves differently from Git - Different keywords for similar operations - Some common Git terms differ in meaning - Branch management is notably different → potential confusion for Git users - Still possible to work on Bazaar repos from Git with a remote helper #### Bridge overview: remote helper `git-remote-bzr` - Project: https://github.com/felipec/git-remote-bzr - Enables `git clone`/`fetch`/`push` against Bazaar repositories #### Installation checklist - Install helper script into PATH - `wget https://raw.github.com/felipec/git-remote-bzr/master/git-remote-bzr -O ~/bin/git-remote-bzr` - `chmod +x ~/bin/git-remote-bzr` - Install Bazaar client (`bzr`) #### Creating a Git repository from a Bazaar repository - Clone using `bzr::` prefix - Recommendation - Don’t attach Git clone to a *local* Bazaar clone - even though both are full clones - Prefer attaching Git clone directly to the *central* Bazaar repository - Example - Remote: `bzr+ssh://developer@mybazaarserver:myproject` - Git clone: - `git clone bzr::bzr+ssh://developer@mybazaarserver:myproject myProject-Git` - `cd myProject-Git` - Post-clone optimization (disk compaction) - `git gc --aggressive` - Especially helpful for big repositories #### Bazaar branches and cloning behavior - Bazaar allows cloning branches; a repository may contain multiple branches - `git-remote-bzr` can clone: - A specific branch - `git clone bzr::bzr://bzr.savannah.gnu.org/emacs/trunk emacs-trunk` - All branches in a repository - `git clone bzr::bzr://bzr.savannah.gnu.org/emacs emacs` - Fetch only selected branches - Configure: - `git config remote-bzr.branches 'trunk, xwindow'` - When remote repo does not allow listing branches - Manually specify branch list and fetch - `git init emacs` - `git remote add origin bzr::bzr://bzr.savannah.gnu.org/emacs` - `git config remote-bzr.branches 'trunk, xwindow'` - `git fetch` #### Ignoring files (Bazaar `.bzrignore` ↔ Git ignores) - Core concern - You shouldn’t create/commit `.gitignore` into a Bazaar-managed project - Could disturb Bazaar users - Solution - Use `.git/info/exclude` (local-only ignores) - Implement as: - symbolic link to `.bzrignore`, or - regular file that mirrors `.bzrignore` - Bazaar ignore features beyond Git - `!!` prefix - ignore patterns even if re-included by a later `!` rule - `RE:` prefix - Python regular expression pattern (Git supports only glob patterns) - Two cases - Case A: `.bzrignore` has no `!!` and no `RE:` lines - Safe to symlink: - `ln -s .bzrignore .git/info/exclude` - Case B: `.bzrignore` contains `!!` and/or `RE:` - Must create/edit `.git/info/exclude` manually to match ignore behavior - Ongoing maintenance warning - Must monitor changes to `.bzrignore` - If `.bzrignore` changes to include unsupported syntax: - remove symlink (if used) - copy `.bzrignore` into `.git/info/exclude` - adapt patterns - Git exclusion caveat - In Git, if a parent directory is excluded, you cannot later re-include a file inside it - Be careful translating Bazaar ignore semantics #### Fetching from Bazaar remote (Git-side) - Use normal Git commands - Example (if working on `master`) - `git pull --rebase origin` - Merge/rebase your work onto `origin/master` #### Pushing to Bazaar remote (Git-side) - Bazaar supports merge commits - Pushing merge commits is acceptable - Typical flow - work on branches - merge into `master` - push: - `git push origin master` #### Caveats (remote-helper limitations) - Some push operations aren’t supported / behave unexpectedly - Branch deletion: - `git push origin :branch-to-delete` (doesn’t work) - Refspec rename: - `git push origin old:new` (pushes `old`) - Dry-run: - `git push --dry-run origin branch` (will push anyway) #### Bazaar summary - Bazaar and Git are similar enough for reasonable interoperability - Key to success - Know the remote isn’t native Git - Respect remote-helper limitations ### Git and Perforce #### Context - Perforce (1995) — oldest VCS covered in chapter - Designed for constraints of its era - Central server, always connected assumption - Only one version stored locally - Still widely used in corporate settings - Two ways to mix Git with Perforce - Git Fusion (server-side) - git-p4 (client-side) #### Option 1: Perforce Git Fusion (server-side bridge) ##### Overview - Product by Perforce: Git Fusion - http://www.perforce.com/git-fusion - Synchronizes Perforce server with Git repositories on server side - Exposes Perforce depot subtrees as read-write Git repos ##### Setting up Git Fusion (example: Perforce-provided VM) - Installation method used in chapter - Download virtual machine image with Perforce daemon + Git Fusion - http://www.perforce.com/downloads/Perforce/20-User - Import into virtualization software (VirtualBox in example) - First boot configuration prompts - Set passwords for Linux users: - `root`, `perforce`, `git` - Provide instance name (distinguish installations on same network) - Note VM IP address (needed for cloning over HTTPS) - Create a Perforce user (as root on VM) - `p4 -p localhost:1666 -u super user -f john` - Opens editor (VI); accept defaults with `:wq` - `p4 -p localhost:1666 -u john passwd` - Enter password twice - `exit` - SSL certificate workaround for example - VM certificate doesn’t match IP → Git rejects HTTPS - Temporary bypass: - `export GIT_SSL_NO_VERIFY=true` - For real installs: install correct certificate per Git Fusion manual - Test clone of sample repo (Talkhouse) - `git clone https:///Talkhouse` - Prompts for credentials (john) - Credential cache helps subsequent commands - Figure reference - Figure 145: Git Fusion virtual machine boot screen (shows IP) ##### Git Fusion configuration (via Perforce client) - Configuration lives in Perforce depot path - `//.git-fusion` directory - Map `//.git-fusion` into a Perforce workspace and browse/edit - Directory structure (high level) - `objects/` - `repos/` and `trees/` (internal object mapping; usually don’t edit) - global `p4gf_config` - per-repo config: `repos//p4gf_config` - user mapping: `users/p4gf_usermap` - Global `p4gf_config` characteristics - INI-style text file - Global defaults; can be overridden by repo-specific configs - Key sections shown (examples) - `[repo-creation] charset = utf8` - `[git-to-perforce]` - `change-owner = author` - `enable-git-branch-creation = yes` - `enable-swarm-reviews = yes` - `enable-git-merge-commits = yes` - `enable-git-submodules = yes` - `preflight-commit = none` - `ignore-author-permissions = no` - `read-permission-check = none` - `git-merge-avoidance-after-change-num = 12107` - `[perforce-to-git]` (`http-url`, `ssh-url`) - `[@features]` feature flags (imports, chunked-push, matrix2, parallel-push) - `[authentication] email-case-sensitivity = no` - Repo-specific `p4gf_config` - Contains `[@repo]` section with per-repo overrides - Contains Perforce-branch ↔ Git-branch mappings via named sections ##### Branch mapping and view mappings (Git Fusion) - Mapping section example - `[Talkhouse-master]` - `git-branch-name = master` - `view = //depot/Talkhouse/main-dev/... ...` - Purpose of settings - `git-branch-name` - Choose friendlier Git branch names (avoid awkward Perforce paths) - `view` - Defines how Perforce files map into the Git repository - Uses standard Perforce view mapping syntax - Multi-project mapping example - One Git branch can combine multiple Perforce depots/subtrees into subdirectories - Example view: - `//depot/project1/main/... project1/...` - `//depot/project2/mainline/... project2/...` ##### User identity mapping (Git Fusion: `users/p4gf_usermap`) - Purpose - Map Perforce users to Git author identities (and vice versa) - Default mapping behavior (without usermap) - Perforce → Git - Look up Perforce user; use stored full name + email in Git commit - Git → Perforce - Look up Perforce user by email in Git commit author field - Submit changeset as that Perforce user (permissions apply) - Mapping file line format - ` ""` - Use cases - Multiple emails mapping to one Perforce account - Supports commits authored under different emails but attributed to same Perforce user - Anonymization / masking internal directory - Replace real names/emails with fictional/anonymous ones in exported Git commits - Matching behavior detail - When creating Git commit from Perforce changeset: - first matching line for Perforce user supplies Git author info - Uniqueness recommendation - email + full name should be unique unless intentionally collapsing attribution ##### Workflow with Git Fusion (from the Git side) - Clone a Git Fusion repository (example Jam) - `git clone https:///Jam` - Initial clone behavior - Git Fusion converts applicable Perforce changesets → Git commits on server - Takes time proportional to history size - Later fetches are incremental and feel more native-speed - Result feels like a normal Git repo - Typical refs: - `master` - `origin/master`, `origin/rel2.1`, etc. - Standard Git workflow applies - Make commits locally - `git fetch` to update remote-tracking branches - `git merge origin/master` to integrate updates - `git push` to publish back - Push mechanics (visible output) - Git Fusion runs conversion back into Perforce: - loads commit tree - finds child commits - runs `git fast-export` - checks commits - copies changelists - submits new Git commit objects to Perforce - Note: processing may continue even if connection closes - Perforce-side visualization - p4v revision graph shows merge structure akin to Git - If Perforce lacks a named branch for Git-side commits - Git Fusion creates an “anonymous” branch under `.git-fusion` to hold them - Figure reference - Figure 146: Perforce revision graph resulting from Git push ##### Git Fusion summary - Advantages - First-class interoperability when server admin can install it - Supports many “full Git” features comfortably - merge commits → recorded as Perforce integrations - submodules (though may look odd to Perforce users) - Limitations - Will reject rewriting history that has already been pushed - If Git Fusion not possible - Use client-side `git-p4` #### Option 2: `git-p4` (client-side Perforce bridge) ##### Overview - Two-way bridge between Git and Perforce - Runs entirely inside your Git repository - No special Perforce server configuration required - Less flexible/comprehensive than Git Fusion - But “good enough” for many workflows ##### Prerequisites / notes - Requires `p4` CLI tool in your PATH - Free download (as referenced in chapter): - http://www.perforce.com/downloads/Perforce/20-User - Must set environment variables for Perforce connection (example) - `export P4PORT=10.0.1.254:1666` - `export P4USER=john` ##### Getting started: cloning from Perforce - Command - `git p4 clone //depot/www/live www-shallow` - Result characteristics - “Shallow” import by default - imports only latest Perforce revision (`#head`) - aligns with Perforce’s “not everyone has all history” model - Git view after clone - local `master` - Perforce state refs: - `p4/master` - `p4/HEAD` - Important nuance: no Git remotes created - `git remote -v` → no remotes - Perforce state is represented as refs, not a Git-managed remote ##### Workflow: sync, rebase, submit - Local development - commit locally on `master` - Get latest from Perforce - `git p4 sync` - incremental import into `refs/remotes/p4/master` - Keep history linear before submitting - divergence between `master` and `p4/master` is possible - recommended: rebase local commits on top of Perforce head - shortcut: - `git p4 rebase` - effectively: `git p4 sync` + `git rebase p4/master` - (with extra smarts for multi-branch situations) - Submit work back to Perforce - `git p4 submit` - creates a Perforce changelist per Git commit between `p4/master` and `master` - opens editor for each changelist specification - imports Git commit message into Perforce change description - includes diff content for context - Authorship mismatch warning (during submit) - If Git author email doesn’t match your Perforce account: - message suggests: - `--preserve-user` to modify authorship - set `git-p4.skipUserNameCheck` to hide warning - After submit completes - git-p4 performs another incremental import - rebases current branch onto `p4/master` - effect resembles a `git push` workflow - Commit rewriting - Submitted commits’ SHA-1 hashes change - git-p4 appends metadata line to commit message, e.g.: - `[git-p4: depot-paths = "//depot/www/live/": change = 12144]` - Squashing strategy - To combine multiple Git commits into one Perforce changeset: - interactive rebase (squash) before `git p4 submit` ##### What about merge commits? - Perforce branching model differs; merge commits aren’t meaningful in Perforce changelist history - `git p4 submit` behavior with merge commits - ignores merge commits - applies only the non-merge commits that aren’t in Perforce yet - Net effect - history becomes linear on submission (as though you rebased) - Practical implication - You can branch and merge freely in Git locally - As long as you can rebase/linearize before submitting - Caveat - Perforce integration metadata (branch lineage) is not preserved; only file-level changes are recorded ##### Branching with `git-p4` - Example Perforce depot layout - `//depot/project/main` - `//depot/project/dev` - Example Perforce branch spec view - `//depot/project/main/... //depot/project/dev/...` - Clone with branch detection - `git p4 clone --detect-branches //depot/project@all` - `@all` imports all changesets that ever touched those paths (full history) - imports additional branches (e.g., `project/dev`) - updates branches list (e.g., `main dev`) - When Perforce branch specs aren’t present - Configure branch relationships manually - `git init project` - `git config git-p4.branchList main:dev` - declares `main` and `dev`; `dev` is child of `main` - `git clone --detect-branches //depot/project@all .` - Working with detected branches - Create local branch from Perforce branch ref - `git checkout -b dev p4/project/dev` - `git p4 submit` targets correct Perforce branch automatically - Limitations and operational constraints - Cannot mix shallow clones with multiple branches - For huge projects needing multiple submit targets - may need one `git p4 clone` per branch to submit to - Branch creation/integration must be done with Perforce tools - git-p4 can only sync/submit to existing branches - can only submit one linear changeset at a time - merge/integration metadata is lost if merging in Git ##### Git + Perforce summary - `git-p4` enables Git-style local workflow with Perforce as server-of-record - Be careful about sharing Git commits - Don’t push commits to shared Git remotes unless already submitted to Perforce - If possible and approved by admin - Git Fusion provides more seamless, first-class integration ## Part 2 — Migrating to Git (converting repositories into native Git) ### Why migrate - Adopt Git as primary VCS for an existing codebase - Goals - Preserve history as much as possible - Clean up author/branch/tag data during conversion - Strategy - Use system-specific importers when available - Otherwise use `git fast-import` with a custom converter ### Migrating from Subversion (SVN) #### Simple path (but imperfect) - Use `git svn clone` to import - Stop using SVN and push resulting Git repo to a new Git server - Caveat - Import can be imperfect; takes long anyway → worth doing a cleaner import #### Author mapping (SVN usernames → Git identities) - Problem - SVN records commit “author” as a username on the SVN system - Git prefers full identity: `Full Name ` - Create `users.txt` mapping file - Format: - `svnuser = Full Name ` - Example: - `schacon = Scott Chacon ` - `selse = Someo Nelse ` - Generate initial list of SVN author names - `svn log --xml --quiet | grep author | sort -u | perl -pe 's/.*>(.*?)<.*/$1 = /'` - Then redirect output into `users.txt` and fill in names/emails - Windows note - Migration steps may require special tooling; referenced guidance: - https://docs.microsoft.com/en-us/azure/devops/repos/git/perform-migration-from-svn-to-git #### Cleaner `git svn clone` for migration - Recommended command pattern - `git svn clone http://my-project.googlecode.com/svn/ \` - `--authors-file=users.txt` - `--no-metadata` - `--prefix ""` - `-s` - `my_project` - Option rationale - `--authors-file` - improves Author field quality in Git commits - `--no-metadata` - removes `git-svn-id` lines in commit messages (cleaner logs) - WARNING: keep metadata if you intend to mirror back to original SVN repo - `--prefix ""` - avoids extra ref prefixes from import - `-s` - assumes standard SVN trunk/branches/tags layout #### Post-import cleanup (make imported refs idiomatic Git) - Convert SVN tags (remote refs) into real Git tags - Problem - `git svn` stores tags as remote refs under `refs/remotes/tags/...` - Conversion loop (creates lightweight tags and deletes remote tag refs) - `for t in $(git for-each-ref --format='%(refname:short)' refs/remotes/tags); do` - `git tag ${t/tags\//} $t && git branch -D -r $t;` - `done` - Convert remaining remote refs into local branches - `for b in $(git for-each-ref --format='%(refname:short)' refs/remotes); do` - `git branch $b refs/remotes/$b && git branch -D -r $b;` - `done` - Remove peg-revision branches (optional cleanup) - Symptom - extra branches suffixed with `@` (SVN “peg-revisions”) - If you don’t need them: - `for p in $(git for-each-ref --format='%(refname:short)' | grep @); do` - `git branch -D $p;` - `done` - Remove redundant `trunk` branch - `git svn` often creates `trunk` ref that points where `master` points - Remove: - `git branch -d trunk` #### Push migrated repo to Git server - Add remote - `git remote add origin git@my-git-server:myrepository.git` - Push all branches - `git push origin --all` - Push tags - `git push origin --tags` ### Migrating from Mercurial (Hg) #### Why it’s straightforward - Git and Mercurial data models are similar - Git is flexible in representing refs/tags #### Tool: `hg-fast-export` - Acquire tool - `git clone https://github.com/frej/fast-export.git` #### Steps - Full clone the Mercurial repo to convert - `hg clone /tmp/hg-repo` - Create author mapping file (optional cleanup but often necessary) - Generate list: - `cd /tmp/hg-repo` - `hg log | grep user: | sort | uniq | sed 's/user: *//' > ../authors` - Convert each line into rule syntax: - `""=""` - Notes - Mercurial allows looser author strings than Git - Mapping file can normalize duplicates, fix invalid formats - Supports Python `string_escape` sequences in mapping strings - Unmatched inputs pass through unchanged - Also usable to rename branches/tags if Mercurial names invalid in Git - Branch mapping: `-B` - Tag mapping: `-T` - Create a new Git repository and run export - `git init /tmp/converted` - `cd /tmp/converted` - `/tmp/fast-export/hg-fast-export.sh -r /tmp/hg-repo -A /tmp/authors` - Output expectations - Exporter reports per-revision progress and file deltas - Mercurial tags exported to Git tags - Mercurial branches/bookmarks become Git branches - Ends with `git-fast-import` statistics - Validate author consolidation - `git shortlog -sn` - Publish to Git server - `git remote add origin git@my-git-server:myrepository.git` - `git push origin --all` ### Migrating from Bazaar (bzr) #### Tooling: Bazaar fast-export → Git fast-import - Requires `bzr-fastimport` plugin (and Python module dependencies) #### Install `bzr-fastimport` - Linux/Unix-like (preferred: package manager) - Debian/Ubuntu: - `sudo apt-get install bzr-fastimport` - RHEL: - `sudo yum install bzr-fastimport` - Fedora 22+: - `sudo dnf install bzr-fastimport` - If package unavailable: install plugin manually - `mkdir --parents ~/.bazaar/plugins` - `cd ~/.bazaar/plugins` - `bzr branch lp:bzr-fastimport fastimport` - `cd fastimport` - `sudo python setup.py install --record=files.txt` - Ensure Python module `fastimport` is present - Check: - `python -c "import fastimport"` - If missing: - `pip install fastimport` - Source: - https://pypi.python.org/pypi/fastimport/ - Windows - Standalone/default Bazaar install includes `bzr-fastimport` (no extra steps) #### Import scenarios ##### Single-branch Bazaar project - `cd /path/to/the/bzr/repository` - Initialize Git - `git init` - Export + import - `bzr fast-export --plain . | git fast-import` - Expected time - seconds to minutes depending on repo size ##### Bazaar repository with multiple branches (main + working branch) - Example branch directories - `myProject.trunk` (main) - `myProject.work` (working branch) - Create Git repo - `git init git-repo` - `cd git-repo` - Import trunk as Git master (with marks) - `bzr fast-export --export-marks=../marks.bzr ../myProject.trunk | \` - `git fast-import --export-marks=../marks.git` - Import work branch as Git branch `work` (reusing marks) - `bzr fast-export --marks=../marks.bzr --git-branch=work ../myProject.work | \` - `git fast-import --import-marks=../marks.git --export-marks=../marks.git` - Verify - `git branch` should show `master` and `work` - Inspect logs - Remove mark files (`marks.bzr`, `marks.git`) after confirmation #### Synchronize working directory + index after import - Issue - staging area may not match HEAD - working directory may not match HEAD after multi-branch import - Fix - `git reset --hard HEAD` #### Convert ignore rules (.bzrignore → .gitignore) - Rename ignore file - `git mv .bzrignore .gitignore` - If `.bzrignore` uses Bazaar-only constructs (`!!`, `RE:`) - modify `.gitignore` (possibly multiple `.gitignore` files) to match behavior - Commit this conversion as part of migration - `git commit -am 'Migration from Bazaar to Git'` #### Publish to Git server - `git remote add origin git@my-git-server:mygitrepository.git` - `git push origin --all` - `git push origin --tags` ### Migrating from Perforce #### Approach A: Perforce Git Fusion - Configure project, branches, and user mappings in Git Fusion - Clone Git Fusion repo (appears native Git) - Push to a native Git host if desired - Optionally, Perforce (via Git Fusion) can continue to host Git repos #### Approach B: `git-p4` as an import tool - Example: import Jam from Perforce Public Depot - Set Perforce server - `export P4PORT=public.perforce.com:1666` - Import full history of subtree (`@all`) - `git-p4 clone //guest/perforce_software/jam@all p4import` - Branches - Use `--detect-branches` if you want multiple branches (when available/configured) - Inspect imported history - `git log` - Commits include Perforce change marker line: - `[git-p4: depot-paths = "...": change = N]` - Optional cleanup: remove git-p4 marker lines (do this before new work) - `git filter-branch --msg-filter 'sed -e "/^\[git-p4:/d"'` - Effect - rewrites commit history; SHA-1 hashes change - Publish to new Git server (after cleanup/verification) ### A custom importer (when no prebuilt tool exists) — `git fast-import` #### When to use - No quality importer exists for your legacy VCS or storage format - You need customized mapping/cleanup beyond available tools #### Why `git fast-import` - Accepts a simple, line-oriented instruction stream on stdin - Efficiently creates Git objects (blobs/trees/commits/refs/tags) - Much easier than - invoking raw/plumbing commands per object, or - writing raw Git objects directly #### Example data source: timestamped directory backups - Source directory structure - `back_YYYY_MM_DD/` (snapshots) - `current/` (latest snapshot) - Goal - Import each snapshot as a commit in a linear history - Each commit represents full tree state at that snapshot #### Git storage reminder (mapping problem to solution) - Git history is a linked list (DAG) of commit objects - Each commit points to a snapshot (tree) - So importer must emit - tree content for each snapshot - commit metadata + parent linkage - order of commits #### Strategy for the example importer - Walk snapshot directories in order - For each snapshot: - create a new commit - link it to previous commit (parent) - wipe tree (`deleteall`) and re-add all files (full snapshot approach) - Notes - fast-import also supports delta-style imports (add/modify/delete only), but that’s more complex #### Ruby implementation (key pieces) - Language choice - Ruby used for readability and convenience - Any language works if it can output proper fast-import stream - Windows newline caution - `git fast-import` expects LF (not CRLF) - Ruby fix: - `$stdout.binmode` ##### Main loop (iterate snapshots) - Pseudocode shape - `last_mark = nil` - `Dir.chdir(ARGV[0]) do` - `Dir.glob("*").each do |dir|` - `next if File.file?(dir)` - `Dir.chdir(dir) do` - `last_mark = print_export(dir, last_mark)` - `end` - `end` - `end` ##### Marks (fast-import commit identifiers) - Definition - “mark” is an integer ID used to reference commits within fast-import stream - Implementation: map directory names to sequential integers - Global: `$marks = []` - `convert_dir_to_mark(dir)` - add dir to `$marks` if not already present - return `($marks.index(dir) + 1).to_s` ##### Dates (commit timestamps from directory names) - Need integer timestamp for committer line - `convert_dir_to_date(dir)` - if `dir == 'current'` → `Time.now().to_i` - else - strip prefix `back_` - parse `year, month, day` - use `Time.local(year, month, day).to_i` ##### Author/committer identity - Hardcoded for example - `$author = 'John Doe '` ##### Fast-import commit record structure (what gets printed) - For each snapshot commit: - `commit refs/heads/master` - `mark :` - `committer -0700` - timezone hardcoded as `-0700` in example - commit message via `data` directive: - `"imported from "` - parent link (except first commit): - `from :` - tree content: - `deleteall` - for each file: `M inline ` + inline `data` (file content) ##### Helper: exporting data blocks (`data \n`) - Used for both - commit messages - file contents - `export_data(string)` - prints: - `data #{string.size}\n#{string}` ##### Helper: writing a file blob inline - `inline_data(file, code = 'M', mode = '644')` - `content = File.read(file)` - `puts "#{code} #{mode} inline #{file}"` - `export_data(content)` - Mode notes - `644` for normal files - must detect executables and use `755` when needed ##### `print_export(dir, last_mark)` responsibilities - Compute metadata - `date = convert_dir_to_date(dir)` - `mark = convert_dir_to_mark(dir)` - Print commit header + metadata + message - Print parent link if present - Print `deleteall` - Walk all files in snapshot - `Dir.glob("**/*")` - `next if !File.file?(file)` - `inline_data(file)` - Return `mark` to become next iteration’s `last_mark` ##### Full script structure (as presented) - Shebang - `#!/usr/bin/env ruby` - Windows newline fix - `$stdout.binmode` - Globals - `$author = "John Doe "` - `$marks = []` - Functions - `convert_dir_to_mark` - `convert_dir_to_date` - `export_data` - `inline_data` - `print_export` - Main loop (iterates snapshot directories, updating `last_mark`) #### Running the importer - Create target Git repo - `git init` - Pipe importer output into `git fast-import` - `ruby import.rb /opt/import_from | git fast-import` - Successful run yields - `git-fast-import statistics` summary (objects, branches, marks, memory, etc.) - Verify commit history - `git log` - Working tree behavior - After import, nothing is checked out by default - Populate working directory: - `git reset --hard master` #### Extending beyond the example - `git fast-import` can handle - file mode changes (e.g., executable bits) - binary data - multiple branches - merges - tags - progress indicators - Reference - examples in Git source: `contrib/fast-import/` ## Chapter wrap-up (Summary) - You can use Git effectively even when the central system is not Git - via bridges/remote helpers (`git svn`, `git-remote-hg`, `git-remote-bzr`, Git Fusion, `git-p4`) - You can migrate repositories from common VCS into native Git - SVN, Mercurial, Bazaar, Perforce - plus custom sources via `git fast-import` - Next step (as hinted in chapter) - understanding Git internals enables even more precise control over repository data ```