# Appendix B: Embedding Git in your Applications ## Why embed / integrate Git - Target audience for integration - Developer-focused applications - likely benefit from integration with source control - Non-developer applications - example: document editors - can benefit from version-control features - Why Git specifically - Git’s model works very well for many different scenarios ## Two main integration options - Option A: spawn a shell and call the `git` command-line program - Option B: embed a Git library into your application - This appendix covers - command-line integration - several of the most popular embeddable Git libraries ## Command-line Git (calling the `git` CLI) - What it is - spawn a shell process - use the Git command-line tool to do the work - Benefits - canonical behavior - all of Git’s features are supported - fairly easy to implement - most runtime environments can invoke a process with command-line arguments - Downsides - Output is plain text - you must parse Git’s output to read progress/results - Git’s output format can change occasionally - parsing can be inefficient and error-prone - Lack of error recovery - if repository is corrupted - or user has malformed configuration value - Git may refuse to perform many operations - Process management complexity - must maintain a shell environment in a separate process - coordinating many processes can be challenging - especially if multiple processes may access the same repository ## Libgit2 - What it is - dependency-free implementation of Git - focus: a nice API for use within other programs - website: https://libgit2.org ### Libgit2 C API (whirlwind tour) - Example flow shown - Open a repository - `git_repository *repo;` - `int error = git_repository_open(&repo, "/path/to/repository");` - Dereference `HEAD` to a commit - `git_object *head_commit;` - `error = git_revparse_single(&head_commit, repo, "HEAD^{commit}");` - `git_commit *commit = (git_commit*)head_commit;` - Print commit properties - `printf("%s", git_commit_message(commit));` - `const git_signature *author = git_commit_author(commit);` - `printf("%s <%s>\n", author->name, author->email);` - `const git_oid *tree_id = git_commit_tree_id(commit);` - Cleanup - `git_commit_free(commit);` - `git_repository_free(repo);` - Repository opening details - `git_repository` type - handle to a repository with an in-memory cache - `git_repository_open` - simplest method when you know exact path to working directory or `.git` folder - other APIs mentioned - `git_repository_open_ext` - includes options for searching - `git_clone` (and friends) - make a local clone of a remote repository - `git_repository_init` - create an entirely new repository - Dereferencing `HEAD` details - rev-parse usage - uses rev-parse syntax - reference: “see Branch References for more on this” - return type - `git_revparse_single` returns a `git_object*` - represents something that exists in the repository’s Git object database - `git_object` is a “parent” type for several object kinds - child types share the same memory layout as `git_object` - safe to cast to the correct “child” type when appropriate - cast safety note in this example - `git_object_type(commit)` would return `GIT_OBJ_COMMIT` - therefore it’s safe to cast to `git_commit*` - Commit property access details - message - `git_commit_message(commit)` - author signature - `git_commit_author(commit)` returns `const git_signature *` - fields shown - `author->name` - `author->email` - tree id - `git_commit_tree_id(commit)` returns a `git_oid` - `git_oid` - Libgit2 representation for a SHA-1 hash ### Patterns illustrated by the Libgit2 C sample - Error-code style - pattern: declare pointer, pass its address into a Libgit2 call - return value: integer error code - `0` = success - `< 0` = error - Memory / ownership rules - if Libgit2 populates a pointer for you - you must free it - if Libgit2 returns a `const` pointer - you don’t free it - it becomes invalid when the owning object is freed - Practical note - “Writing C is a bit painful.” ### Language bindings (Libgit2 ecosystem) - Implication of “writing C is painful” - you’re unlikely to write C when using Libgit2 - there are language-specific bindings that make integration easier #### Ruby bindings: Rugged - Name: Rugged - URL: https://github.com/libgit2/rugged - Example equivalent to the C code - `repo = Rugged::Repository.new('path/to/repository')` - `commit = repo.head.target` - `puts commit.message` - `puts "#{commit.author[:name]} <#{commit.author[:email]}>" ` - `tree = commit.tree` - Why it’s “less cluttered” - error handling - Rugged uses exceptions - examples mentioned: `ConfigError`, `ObjectError` - resource management - no explicit freeing - Ruby is garbage-collected - Example: crafting a commit from scratch (Rugged) - Code sequence shown (with numbered markers) - ① create a new blob - `blob_id = repo.write("Blob contents", :blob) ①` - work with index - `index = repo.index` - `index.read_tree(repo.head.target.tree)` - ② add a new file entry - `index.add(:path => 'newfile.txt', :oid => blob_id) ②` - build a signature hash - `sig = {` - ` :email => "bob@example.com",` - ` :name => "Bob User",` - ` :time => Time.now,` - `}` - create the commit (with parameters) - `commit_id = Rugged::Commit.create(repo,` - ` :tree => index.write_tree(repo), ③` - ` :author => sig,` - ` :committer => sig, ④` - ` :message => "Add newfile.txt", ⑤` - ` :parents => repo.empty? ? [] : [ repo.head.target ].compact, ⑥` - ` :update_ref => 'HEAD', ⑦` - `)` - ⑧ look up the created commit object - `commit = repo.lookup(commit_id) ⑧` - Meaning of each numbered step (①–⑧) - ① Create a new blob - contains the contents of a new file - ② Populate index and add file - populate index with head commit’s tree - add the new file at path `newfile.txt` - ③ Create a new tree in the ODB - uses it for the new commit - ④ Author and committer fields - same signature used for both - ⑤ Commit message - `"Add newfile.txt"` - ⑥ Parents - when creating a commit, you must specify parents - uses the tip of `HEAD` for the single parent - handles empty repository case - ⑦ Update a ref (optional) - Rugged (and Libgit2) can optionally update a reference when making a commit - here it updates `HEAD` - ⑧ Return value / lookup - the return value is the SHA-1 hash of the new commit object - you can use it to get a `Commit` object - Performance note - Ruby code is clean - Libgit2 does heavy lifting → runs pretty fast - Pointer to later section - “If you’re not a rubyist, we touch on some other bindings in Other Bindings.” ## Advanced Functionality (Libgit2) - Out-of-core-Git capabilities - Libgit2 has capabilities outside the scope of core Git - Example capability: pluggability - can provide custom “backends” for several operation types - enables storage in a different way than stock Git - backend types mentioned - configuration - ref storage - object database - “among other things” ### Custom backend example: object database (ODB) - Example source - from Libgit2 backend examples - URL: https://github.com/libgit2/libgit2-backends - Setup shown (with numbered markers) - ① create ODB “frontend” - `git_odb *odb;` - `int error = git_odb_new(&odb); ①` - meaning: initialize empty ODB frontend container for backends - ② initialize custom backend - `git_odb_backend *my_backend;` - `error = git_odb_backend_mine(&my_backend, /*…*/); ②` - ③ add backend to frontend - `error = git_odb_add_backend(odb, my_backend, 1); ③` - open a repository - `git_repository *repo;` - `error = git_repository_open(&repo, "some-path");` - ④ set repository to use custom ODB - `error = git_repository_set_odb(repo, odb); ④` - meaning: repo uses this ODB to look up objects - Note about the example’s error handling - errors are captured but not handled - “We hope your code is better than ours.” ### Implementing `git_odb_backend_mine` - What it is - constructor for your own ODB implementation - Requirement - fill in the `git_odb_backend` structure properly - Example struct layout shown - `typedef struct {` - ` git_odb_backend parent;` - ` // Some other stuff` - ` void *custom_context;` - `} my_backend_struct;` - Subtle memory-layout constraint - `my_backend_struct`’s first member must be a `git_odb_backend` structure - ensures Libgit2 sees the memory layout it expects - Flexibility - the rest of the struct is arbitrary - can be as large or small as needed - Example initialization function responsibilities shown - allocate - `backend = calloc(1, sizeof (my_backend_struct));` - set custom context - `backend->custom_context = …;` - fill supported function pointers in `parent` - `backend->parent.read = &my_backend__read;` - `backend->parent.read_prefix = &my_backend__read_prefix;` - `backend->parent.read_header = &my_backend__read_header;` - `// …` - return it through output parameter - `*backend_out = (git_odb_backend *) backend;` - return success constant - `return GIT_SUCCESS;` - Where to find full signatures - Libgit2 source file: - `include/git2/sys/odb_backend.h` - which signatures to implement depends on use case ## Other Bindings (Libgit2) - Breadth - bindings exist for many languages - Section purpose - show small examples using a few more complete bindings packages (as of writing) - Other languages mentioned as having libraries (various maturity) - C++ - Go - Node.js - Erlang - JVM - Official collection of bindings - browse repos: https://github.com/libgit2 - Common goal for the code in this section - return the commit message from the commit eventually pointed to by `HEAD` - “sort of like `git log -1`” ### LibGit2Sharp - For - .NET or Mono applications - URL - https://github.com/libgit2/libgit2sharp - Characteristics - bindings written in C# - wraps raw Libgit2 calls with native-feeling CLR APIs - Example program (single expression) - `new Repository(@"C:\path\to\repo").Head.Tip.Message;` - Desktop Windows note - NuGet package available to get started quickly ### objective-git - Platform context - Apple platform - likely using Objective-C as implementation language - URL - https://github.com/libgit2/objective-git - Example program outline - initialize repo - `GTRepository *repo =` - ` [[GTRepository alloc] initWithURL:[NSURL fileURLWithPath: @"/path/to/repo"]` - `error:NULL];` - retrieve commit message - `NSString *msg = [[[repo headReferenceWithError:NULL] resolvedTarget] message];` - Swift note - objective-git is fully interoperable with Swift ### pygit2 - What it is - Python bindings for Libgit2 - URL - https://www.pygit2.org - Example program (chained calls) - `pygit2.Repository("/path/to/repo") # open repository` - `.head # get the current branch` - `.peel(pygit2.Commit) # walk down to the commit` - `.message # read the message` ## Further Reading (Libgit2) - Scope note - full treatment of Libgit2 capabilities is outside the scope of the book - Libgit2 resources - API documentation: https://libgit2.github.com/libgit2 - guides: https://libgit2.github.com/docs - Other bindings - check bundled README and tests - often have small tutorials and pointers to further reading ## JGit - Purpose - use Git from within a Java program - What it is - fully featured Git library called JGit - relatively full-featured implementation of Git written natively in Java - widely used in the Java community - under the Eclipse umbrella - Home - https://www.eclipse.org/jgit/ ### Getting Set Up (JGit) - Multiple ways to connect project to JGit - Easiest path: Maven - add dependency snippet to `` in `pom.xml` - `` - ` org.eclipse.jgit` - ` org.eclipse.jgit` - ` 3.5.0.201409260305-r` - `` - version note - likely advanced by the time you read this - check updates: - https://mvnrepository.com/artifact/org.eclipse.jgit/org.eclipse.jgit - result - Maven automatically acquires and uses the JGit libraries you need - Manual dependency management - pre-built binaries - https://www.eclipse.org/jgit/download - compile/run examples - `javac -cp .:org.eclipse.jgit-3.5.0.201409260305-r.jar App.java` - `java -cp .:org.eclipse.jgit-3.5.0.201409260305-r.jar App` ### Plumbing (JGit) - Two levels of API - plumbing - porcelain - Terminology source: Git itself - porcelain APIs - friendly front-end for common user-level actions - like what a normal user would use the Git command-line tool for - plumbing APIs - interact with low-level repository objects directly #### Starting point: `Repository` - Starting point for most JGit sessions - class: `Repository` - Creating/opening a filesystem-based repository - note: JGit also allows other storage models - Create new repository - `Repository newlyCreatedRepo = FileRepositoryBuilder.create(new File("/tmp/new_repo/.git"));` - `newlyCreatedRepo.create();` - Open existing repository - `Repository existingRepo = new FileRepositoryBuilder()` - `.setGitDir(new File("my_repo/.git"))` - `.build();` #### `FileRepositoryBuilder` (finding repositories) - Builder style - fluent API - Helps locate a Git repository - whether or not your program knows exactly where it’s located - Methods/strategies mentioned - environment variables - `.readEnvironment()` - search starting from working directory - `.setWorkTree(…).findGitDir()` - open known `.git` directory - `.setGitDir(...)` (as in example) #### Plumbing API: quick sampling + explanations - Sampling actions shown (code outline) - Get a reference - `Ref master = repo.getRef("master");` - Get object ID pointed to by reference - `ObjectId masterTip = master.getObjectId();` - Rev-parse - `ObjectId obj = repo.resolve("HEAD^{tree}");` - Load raw object contents - `ObjectLoader loader = repo.open(masterTip);` - `loader.copyTo(System.out);` - Create a branch - `RefUpdate createBranch1 = repo.updateRef("refs/heads/branch1");` - `createBranch1.setNewObjectId(masterTip);` - `createBranch1.update();` - Delete a branch - `RefUpdate deleteBranch1 = repo.updateRef("refs/heads/branch1");` - `deleteBranch1.setForceUpdate(true);` - `deleteBranch1.delete();` - Config - `Config cfg = repo.getConfig();` - `String name = cfg.getString("user", null, "name");` - Explanation: references (`Ref`) - `repo.getRef("master")` - JGit automatically grabs the actual master ref at `refs/heads/master` - returns a `Ref` object for reading information about the reference - `Ref` info available - name: `.getName()` - direct reference target object: `.getObjectId()` - symbolic reference target reference: `.getTarget()` - `Ref` objects also used for - tag refs - tag objects - Tag “peeled” concept - peeled = points to final target of a (potentially long) string of tag objects - Explanation: object IDs (`ObjectId`) - represents SHA-1 hash of an object - object might or might not exist in the object database - Explanation: rev-parse (`repo.resolve(...)`) - accepts any object specifier Git understands - returns - a valid `ObjectId`, or - `null` - reference: “see Branch References” - Explanation: raw object access (`ObjectLoader`) - can stream contents - `ObjectLoader.copyTo(...)` - other capabilities mentioned - read type and size of object - return contents as a byte array - large object handling - when `.isLarge()` is `true` - `.openStream()` returns an InputStream-like object - reads raw data without pulling everything into memory at once - Explanation: creating a branch (`RefUpdate`) - create `RefUpdate` - set new object ID - call `.update()` to trigger change - Explanation: deleting a branch - requires `.setForceUpdate(true)` - otherwise `.delete()` returns `REJECTED` - and nothing happens - Explanation: config (`Config`) - get via `repo.getConfig()` - example value read - `user.name` via `cfg.getString("user", null, "name")` - config resolution behavior - uses repository for local configuration - automatically detects global and system config files - reads values from them as well - Error handling in JGit (not shown in code sample) - handled via exceptions - may throw standard Java exceptions - example: `IOException` - also has JGit-specific exceptions (examples) - `NoRemoteRepositoryException` - `CorruptObjectException` - `NoMergeBaseException` - Scope note - this is only a small sampling of the full plumbing API - many more methods/classes exist ### Porcelain (JGit) - Why porcelain exists - plumbing APIs are rather complete - but can be cumbersome to string together for common goals - adding a file to the index - making a new commit - Entry point class - `Git` - construction shown - `Repository repo;` - `// construct repo...` - `Git git = new Git(repo);` #### Porcelain command pattern (Git class) - Pattern - `Git` methods return a command object - chain method calls to set parameters - execute via `.call()` #### Example: like `git ls-remote` - Credentials - `CredentialsProvider cp = new UsernamePasswordCredentialsProvider("username", "p4ssw0rd");` - Command chain - `Collection remoteRefs = git.lsRemote()` - `.setCredentialsProvider(cp)` - `.setRemote("origin")` - `.setTags(true)` - `.setHeads(false)` - `.call();` - Output loop - `for (Ref ref : remoteRefs) {` - ` System.out.println(ref.getName() + " -> " + ref.getObjectId().name());` - `}` - What it requests - tags from `origin` - not heads - Authentication note - uses a `CredentialsProvider` #### Other commands available through `Git` (examples listed) - add - blame - commit - clean - push - rebase - revert - reset ### Further Reading (JGit) - Official JGit API documentation - https://www.eclipse.org/jgit/documentation - standard Javadoc - JVM IDEs can install locally as well - JGit Cookbook - https://github.com/centic9/jgit-cookbook - many examples of specific tasks ## go-git - When to use - integrate Git into a service written in Golang - What it is - pure Go library implementation - no native dependencies - not prone to manual memory management errors - transparent to standard Golang performance analysis tooling - CPU profilers - memory profilers - race detector - etc. - Focus - extensibility - compatibility - Compatibility / API coverage note - supports most plumbing APIs - compatibility documented at: - https://github.com/go-git/go-git/blob/master/COMPATIBILITY.md ### Basic go-git example - Import - `import "github.com/go-git/go-git/v5"` - Clone - `r, err := git.PlainClone("/tmp/foo", false, &git.CloneOptions{` - ` URL: "https://github.com/go-git/go-git",` - ` Progress: os.Stdout,` - `})` ### After you have a `Repository` instance - “Access information and perform mutations” - Example operations shown - Get branch pointed by `HEAD` - `ref, err := r.Head()` - Get commit object pointed by `ref` - `commit, err := r.CommitObject(ref.Hash())` - Get commit history - `history, err := commit.History()` - Iterate commits and print each - `for _, c := range history {` - ` fmt.Println(c)` - `}` ### Advanced Functionality (go-git) - Feature: pluggable storage system - similar to Libgit2 backends - default implementation: in-memory storage - “very fast” - example: clone into memory storage - `r, err := git.Clone(memory.NewStorage(), nil, &git.CloneOptions{` - ` URL: "https://github.com/go-git/go-git",` - `})` - Storage options example - store references, objects, and configuration in Aerospike - example location: - https://github.com/go-git/go-git/tree/master/_examples/storage - Feature: flexible filesystem abstraction - uses go-billy `Filesystem` - https://pkg.go.dev/github.com/go-git/go-billy/v5?tab=doc#Filesystem - makes it easy to store files differently - pack all files into a single archive on disk - keep all files in-memory - Advanced use-case: fine-tunable HTTP client - example referenced: - https://github.com/go-git/go-git/blob/master/_examples/custom_http/main.go - custom client shown - `customClient := &http.Client{` - ` Transport: &http.Transport{ // accept any certificate (might be useful for testing)` - ` TLSClientConfig: &tls.Config{InsecureSkipVerify: true},` - ` },` - ` Timeout: 15 * time.Second, // 15 second timeout` - ` CheckRedirect: func(req *http.Request, via []*http.Request) error {` - ` return http.ErrUseLastResponse // don't follow redirect` - ` },` - `}` - override protocol handling - `client.InstallProtocol("https", githttp.NewClient(customClient))` - purpose: override http(s) default protocol to use custom client - clone using new client (for `https://`) - `r, err := git.Clone(memory.NewStorage(), nil, &git.CloneOptions{URL: url})` ### Further Reading (go-git) - Scope note - full treatment outside scope of the book - API documentation - https://pkg.go.dev/github.com/go-git/go-git/v5 - Usage examples - https://github.com/go-git/go-git/tree/master/_examples ## Dulwich - What it is - pure-Python Git implementation: Dulwich - Project hosting / site - https://www.dulwich.io/ - Goal - interface to Git repositories (local and remote) - does not call out to `git` directly - uses pure Python instead - Performance note - optional C extensions - significantly improve performance - API design - follows Git design - separates two API levels - plumbing - porcelain ### Dulwich plumbing example (lower-level API) - Goal - access the commit message of the last commit - Code and shown outputs - `from dulwich.repo import Repo` - `r = Repo('.')` - `r.head()` - `# '57fbe010446356833a6ad1600059d80b1e731e15'` - `c = r[r.head()]` - `c` - `# ` - `c.message` - `# 'Add note about encoding.\n'` ### Dulwich porcelain example (high-level API) - Goal - print a commit log using porcelain API - Code and shown outputs - `from dulwich import porcelain` - `porcelain.log('.', max_entries=1)` - `#commit: 57fbe010446356833a6ad1600059d80b1e731e15` - `#Author: Jelmer Vernooij ` - `#Date: Sat Apr 29 2017 23:57:34 +0000` ### Further Reading (Dulwich) - Available on official website - API documentation - tutorial - many task-focused examples - URL - https://www.dulwich.io/