Files
mapas-mentales/mindmap/Appendix B_ Embedding Git in your Applications.md

695 lines
23 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# Appendix B: Embedding Git in your Applications
## Why embed / integrate Git
- Target audience for integration
- Developer-focused applications
- likely benefit from integration with source control
- Non-developer applications
- example: document editors
- can benefit from version-control features
- Why Git specifically
- Gits model works very well for many different scenarios
## Two main integration options
- Option A: spawn a shell and call the `git` command-line program
- Option B: embed a Git library into your application
- This appendix covers
- command-line integration
- several of the most popular embeddable Git libraries
## Command-line Git (calling the `git` CLI)
- What it is
- spawn a shell process
- use the Git command-line tool to do the work
- Benefits
- canonical behavior
- all of Gits features are supported
- fairly easy to implement
- most runtime environments can invoke a process with command-line arguments
- Downsides
- Output is plain text
- you must parse Gits output to read progress/results
- Gits output format can change occasionally
- parsing can be inefficient and error-prone
- Lack of error recovery
- if repository is corrupted
- or user has malformed configuration value
- Git may refuse to perform many operations
- Process management complexity
- must maintain a shell environment in a separate process
- coordinating many processes can be challenging
- especially if multiple processes may access the same repository
## Libgit2
- What it is
- dependency-free implementation of Git
- focus: a nice API for use within other programs
- website: https://libgit2.org
### Libgit2 C API (whirlwind tour)
- Example flow shown
- Open a repository
- `git_repository *repo;`
- `int error = git_repository_open(&repo, "/path/to/repository");`
- Dereference `HEAD` to a commit
- `git_object *head_commit;`
- `error = git_revparse_single(&head_commit, repo, "HEAD^{commit}");`
- `git_commit *commit = (git_commit*)head_commit;`
- Print commit properties
- `printf("%s", git_commit_message(commit));`
- `const git_signature *author = git_commit_author(commit);`
- `printf("%s <%s>\n", author->name, author->email);`
- `const git_oid *tree_id = git_commit_tree_id(commit);`
- Cleanup
- `git_commit_free(commit);`
- `git_repository_free(repo);`
- Repository opening details
- `git_repository` type
- handle to a repository with an in-memory cache
- `git_repository_open`
- simplest method when you know exact path to working directory or `.git` folder
- other APIs mentioned
- `git_repository_open_ext`
- includes options for searching
- `git_clone` (and friends)
- make a local clone of a remote repository
- `git_repository_init`
- create an entirely new repository
- Dereferencing `HEAD` details
- rev-parse usage
- uses rev-parse syntax
- reference: “see Branch References for more on this”
- return type
- `git_revparse_single` returns a `git_object*`
- represents something that exists in the repositorys Git object database
- `git_object` is a “parent” type for several object kinds
- child types share the same memory layout as `git_object`
- safe to cast to the correct “child” type when appropriate
- cast safety note in this example
- `git_object_type(commit)` would return `GIT_OBJ_COMMIT`
- therefore its safe to cast to `git_commit*`
- Commit property access details
- message
- `git_commit_message(commit)`
- author signature
- `git_commit_author(commit)` returns `const git_signature *`
- fields shown
- `author->name`
- `author->email`
- tree id
- `git_commit_tree_id(commit)` returns a `git_oid`
- `git_oid`
- Libgit2 representation for a SHA-1 hash
### Patterns illustrated by the Libgit2 C sample
- Error-code style
- pattern: declare pointer, pass its address into a Libgit2 call
- return value: integer error code
- `0` = success
- `< 0` = error
- Memory / ownership rules
- if Libgit2 populates a pointer for you
- you must free it
- if Libgit2 returns a `const` pointer
- you dont free it
- it becomes invalid when the owning object is freed
- Practical note
- “Writing C is a bit painful.”
### Language bindings (Libgit2 ecosystem)
- Implication of “writing C is painful”
- youre unlikely to write C when using Libgit2
- there are language-specific bindings that make integration easier
#### Ruby bindings: Rugged
- Name: Rugged
- URL: https://github.com/libgit2/rugged
- Example equivalent to the C code
- `repo = Rugged::Repository.new('path/to/repository')`
- `commit = repo.head.target`
- `puts commit.message`
- `puts "#{commit.author[:name]} <#{commit.author[:email]}>" `
- `tree = commit.tree`
- Why its “less cluttered”
- error handling
- Rugged uses exceptions
- examples mentioned: `ConfigError`, `ObjectError`
- resource management
- no explicit freeing
- Ruby is garbage-collected
- Example: crafting a commit from scratch (Rugged)
- Code sequence shown (with numbered markers)
- ① create a new blob
- `blob_id = repo.write("Blob contents", :blob) ①`
- work with index
- `index = repo.index`
- `index.read_tree(repo.head.target.tree)`
- ② add a new file entry
- `index.add(:path => 'newfile.txt', :oid => blob_id) ②`
- build a signature hash
- `sig = {`
- ` :email => "bob@example.com",`
- ` :name => "Bob User",`
- ` :time => Time.now,`
- `}`
- create the commit (with parameters)
- `commit_id = Rugged::Commit.create(repo,`
- ` :tree => index.write_tree(repo), ③`
- ` :author => sig,`
- ` :committer => sig, ④`
- ` :message => "Add newfile.txt", ⑤`
- ` :parents => repo.empty? ? [] : [ repo.head.target ].compact, ⑥`
- ` :update_ref => 'HEAD', ⑦`
- `)`
- ⑧ look up the created commit object
- `commit = repo.lookup(commit_id) ⑧`
- Meaning of each numbered step (①–⑧)
- ① Create a new blob
- contains the contents of a new file
- ② Populate index and add file
- populate index with head commits tree
- add the new file at path `newfile.txt`
- ③ Create a new tree in the ODB
- uses it for the new commit
- ④ Author and committer fields
- same signature used for both
- ⑤ Commit message
- `"Add newfile.txt"`
- ⑥ Parents
- when creating a commit, you must specify parents
- uses the tip of `HEAD` for the single parent
- handles empty repository case
- ⑦ Update a ref (optional)
- Rugged (and Libgit2) can optionally update a reference when making a commit
- here it updates `HEAD`
- ⑧ Return value / lookup
- the return value is the SHA-1 hash of the new commit object
- you can use it to get a `Commit` object
- Performance note
- Ruby code is clean
- Libgit2 does heavy lifting → runs pretty fast
- Pointer to later section
- “If youre not a rubyist, we touch on some other bindings in Other Bindings.”
## Advanced Functionality (Libgit2)
- Out-of-core-Git capabilities
- Libgit2 has capabilities outside the scope of core Git
- Example capability: pluggability
- can provide custom “backends” for several operation types
- enables storage in a different way than stock Git
- backend types mentioned
- configuration
- ref storage
- object database
- “among other things”
### Custom backend example: object database (ODB)
- Example source
- from Libgit2 backend examples
- URL: https://github.com/libgit2/libgit2-backends
- Setup shown (with numbered markers)
- ① create ODB “frontend”
- `git_odb *odb;`
- `int error = git_odb_new(&odb); ①`
- meaning: initialize empty ODB frontend container for backends
- ② initialize custom backend
- `git_odb_backend *my_backend;`
- `error = git_odb_backend_mine(&my_backend, /*…*/); ②`
- ③ add backend to frontend
- `error = git_odb_add_backend(odb, my_backend, 1); ③`
- open a repository
- `git_repository *repo;`
- `error = git_repository_open(&repo, "some-path");`
- ④ set repository to use custom ODB
- `error = git_repository_set_odb(repo, odb); ④`
- meaning: repo uses this ODB to look up objects
- Note about the examples error handling
- errors are captured but not handled
- “We hope your code is better than ours.”
### Implementing `git_odb_backend_mine`
- What it is
- constructor for your own ODB implementation
- Requirement
- fill in the `git_odb_backend` structure properly
- Example struct layout shown
- `typedef struct {`
- ` git_odb_backend parent;`
- ` // Some other stuff`
- ` void *custom_context;`
- `} my_backend_struct;`
- Subtle memory-layout constraint
- `my_backend_struct`s first member must be a `git_odb_backend` structure
- ensures Libgit2 sees the memory layout it expects
- Flexibility
- the rest of the struct is arbitrary
- can be as large or small as needed
- Example initialization function responsibilities shown
- allocate
- `backend = calloc(1, sizeof (my_backend_struct));`
- set custom context
- `backend->custom_context = …;`
- fill supported function pointers in `parent`
- `backend->parent.read = &my_backend__read;`
- `backend->parent.read_prefix = &my_backend__read_prefix;`
- `backend->parent.read_header = &my_backend__read_header;`
- `// …`
- return it through output parameter
- `*backend_out = (git_odb_backend *) backend;`
- return success constant
- `return GIT_SUCCESS;`
- Where to find full signatures
- Libgit2 source file:
- `include/git2/sys/odb_backend.h`
- which signatures to implement depends on use case
## Other Bindings (Libgit2)
- Breadth
- bindings exist for many languages
- Section purpose
- show small examples using a few more complete bindings packages (as of writing)
- Other languages mentioned as having libraries (various maturity)
- C++
- Go
- Node.js
- Erlang
- JVM
- Official collection of bindings
- browse repos: https://github.com/libgit2
- Common goal for the code in this section
- return the commit message from the commit eventually pointed to by `HEAD`
- “sort of like `git log -1`
### LibGit2Sharp
- For
- .NET or Mono applications
- URL
- https://github.com/libgit2/libgit2sharp
- Characteristics
- bindings written in C#
- wraps raw Libgit2 calls with native-feeling CLR APIs
- Example program (single expression)
- `new Repository(@"C:\path\to\repo").Head.Tip.Message;`
- Desktop Windows note
- NuGet package available to get started quickly
### objective-git
- Platform context
- Apple platform
- likely using Objective-C as implementation language
- URL
- https://github.com/libgit2/objective-git
- Example program outline
- initialize repo
- `GTRepository *repo =`
- ` [[GTRepository alloc] initWithURL:[NSURL fileURLWithPath: @"/path/to/repo"]`
- `error:NULL];`
- retrieve commit message
- `NSString *msg = [[[repo headReferenceWithError:NULL] resolvedTarget] message];`
- Swift note
- objective-git is fully interoperable with Swift
### pygit2
- What it is
- Python bindings for Libgit2
- URL
- https://www.pygit2.org
- Example program (chained calls)
- `pygit2.Repository("/path/to/repo") # open repository`
- `.head # get the current branch`
- `.peel(pygit2.Commit) # walk down to the commit`
- `.message # read the message`
## Further Reading (Libgit2)
- Scope note
- full treatment of Libgit2 capabilities is outside the scope of the book
- Libgit2 resources
- API documentation: https://libgit2.github.com/libgit2
- guides: https://libgit2.github.com/docs
- Other bindings
- check bundled README and tests
- often have small tutorials and pointers to further reading
## JGit
- Purpose
- use Git from within a Java program
- What it is
- fully featured Git library called JGit
- relatively full-featured implementation of Git written natively in Java
- widely used in the Java community
- under the Eclipse umbrella
- Home
- https://www.eclipse.org/jgit/
### Getting Set Up (JGit)
- Multiple ways to connect project to JGit
- Easiest path: Maven
- add dependency snippet to `<dependencies>` in `pom.xml`
- `<dependency>`
- ` <groupId>org.eclipse.jgit</groupId>`
- ` <artifactId>org.eclipse.jgit</artifactId>`
- ` <version>3.5.0.201409260305-r</version>`
- `</dependency>`
- version note
- likely advanced by the time you read this
- check updates:
- https://mvnrepository.com/artifact/org.eclipse.jgit/org.eclipse.jgit
- result
- Maven automatically acquires and uses the JGit libraries you need
- Manual dependency management
- pre-built binaries
- https://www.eclipse.org/jgit/download
- compile/run examples
- `javac -cp .:org.eclipse.jgit-3.5.0.201409260305-r.jar App.java`
- `java -cp .:org.eclipse.jgit-3.5.0.201409260305-r.jar App`
### Plumbing (JGit)
- Two levels of API
- plumbing
- porcelain
- Terminology source: Git itself
- porcelain APIs
- friendly front-end for common user-level actions
- like what a normal user would use the Git command-line tool for
- plumbing APIs
- interact with low-level repository objects directly
#### Starting point: `Repository`
- Starting point for most JGit sessions
- class: `Repository`
- Creating/opening a filesystem-based repository
- note: JGit also allows other storage models
- Create new repository
- `Repository newlyCreatedRepo = FileRepositoryBuilder.create(new File("/tmp/new_repo/.git"));`
- `newlyCreatedRepo.create();`
- Open existing repository
- `Repository existingRepo = new FileRepositoryBuilder()`
- `.setGitDir(new File("my_repo/.git"))`
- `.build();`
#### `FileRepositoryBuilder` (finding repositories)
- Builder style
- fluent API
- Helps locate a Git repository
- whether or not your program knows exactly where its located
- Methods/strategies mentioned
- environment variables
- `.readEnvironment()`
- search starting from working directory
- `.setWorkTree(…).findGitDir()`
- open known `.git` directory
- `.setGitDir(...)` (as in example)
#### Plumbing API: quick sampling + explanations
- Sampling actions shown (code outline)
- Get a reference
- `Ref master = repo.getRef("master");`
- Get object ID pointed to by reference
- `ObjectId masterTip = master.getObjectId();`
- Rev-parse
- `ObjectId obj = repo.resolve("HEAD^{tree}");`
- Load raw object contents
- `ObjectLoader loader = repo.open(masterTip);`
- `loader.copyTo(System.out);`
- Create a branch
- `RefUpdate createBranch1 = repo.updateRef("refs/heads/branch1");`
- `createBranch1.setNewObjectId(masterTip);`
- `createBranch1.update();`
- Delete a branch
- `RefUpdate deleteBranch1 = repo.updateRef("refs/heads/branch1");`
- `deleteBranch1.setForceUpdate(true);`
- `deleteBranch1.delete();`
- Config
- `Config cfg = repo.getConfig();`
- `String name = cfg.getString("user", null, "name");`
- Explanation: references (`Ref`)
- `repo.getRef("master")`
- JGit automatically grabs the actual master ref at `refs/heads/master`
- returns a `Ref` object for reading information about the reference
- `Ref` info available
- name: `.getName()`
- direct reference target object: `.getObjectId()`
- symbolic reference target reference: `.getTarget()`
- `Ref` objects also used for
- tag refs
- tag objects
- Tag “peeled” concept
- peeled = points to final target of a (potentially long) string of tag objects
- Explanation: object IDs (`ObjectId`)
- represents SHA-1 hash of an object
- object might or might not exist in the object database
- Explanation: rev-parse (`repo.resolve(...)`)
- accepts any object specifier Git understands
- returns
- a valid `ObjectId`, or
- `null`
- reference: “see Branch References”
- Explanation: raw object access (`ObjectLoader`)
- can stream contents
- `ObjectLoader.copyTo(...)`
- other capabilities mentioned
- read type and size of object
- return contents as a byte array
- large object handling
- when `.isLarge()` is `true`
- `.openStream()` returns an InputStream-like object
- reads raw data without pulling everything into memory at once
- Explanation: creating a branch (`RefUpdate`)
- create `RefUpdate`
- set new object ID
- call `.update()` to trigger change
- Explanation: deleting a branch
- requires `.setForceUpdate(true)`
- otherwise `.delete()` returns `REJECTED`
- and nothing happens
- Explanation: config (`Config`)
- get via `repo.getConfig()`
- example value read
- `user.name` via `cfg.getString("user", null, "name")`
- config resolution behavior
- uses repository for local configuration
- automatically detects global and system config files
- reads values from them as well
- Error handling in JGit (not shown in code sample)
- handled via exceptions
- may throw standard Java exceptions
- example: `IOException`
- also has JGit-specific exceptions (examples)
- `NoRemoteRepositoryException`
- `CorruptObjectException`
- `NoMergeBaseException`
- Scope note
- this is only a small sampling of the full plumbing API
- many more methods/classes exist
### Porcelain (JGit)
- Why porcelain exists
- plumbing APIs are rather complete
- but can be cumbersome to string together for common goals
- adding a file to the index
- making a new commit
- Entry point class
- `Git`
- construction shown
- `Repository repo;`
- `// construct repo...`
- `Git git = new Git(repo);`
#### Porcelain command pattern (Git class)
- Pattern
- `Git` methods return a command object
- chain method calls to set parameters
- execute via `.call()`
#### Example: like `git ls-remote`
- Credentials
- `CredentialsProvider cp = new UsernamePasswordCredentialsProvider("username", "p4ssw0rd");`
- Command chain
- `Collection<Ref> remoteRefs = git.lsRemote()`
- `.setCredentialsProvider(cp)`
- `.setRemote("origin")`
- `.setTags(true)`
- `.setHeads(false)`
- `.call();`
- Output loop
- `for (Ref ref : remoteRefs) {`
- ` System.out.println(ref.getName() + " -> " + ref.getObjectId().name());`
- `}`
- What it requests
- tags from `origin`
- not heads
- Authentication note
- uses a `CredentialsProvider`
#### Other commands available through `Git` (examples listed)
- add
- blame
- commit
- clean
- push
- rebase
- revert
- reset
### Further Reading (JGit)
- Official JGit API documentation
- https://www.eclipse.org/jgit/documentation
- standard Javadoc
- JVM IDEs can install locally as well
- JGit Cookbook
- https://github.com/centic9/jgit-cookbook
- many examples of specific tasks
## go-git
- When to use
- integrate Git into a service written in Golang
- What it is
- pure Go library implementation
- no native dependencies
- not prone to manual memory management errors
- transparent to standard Golang performance analysis tooling
- CPU profilers
- memory profilers
- race detector
- etc.
- Focus
- extensibility
- compatibility
- Compatibility / API coverage note
- supports most plumbing APIs
- compatibility documented at:
- https://github.com/go-git/go-git/blob/master/COMPATIBILITY.md
### Basic go-git example
- Import
- `import "github.com/go-git/go-git/v5"`
- Clone
- `r, err := git.PlainClone("/tmp/foo", false, &git.CloneOptions{`
- ` URL: "https://github.com/go-git/go-git",`
- ` Progress: os.Stdout,`
- `})`
### After you have a `Repository` instance
- “Access information and perform mutations”
- Example operations shown
- Get branch pointed by `HEAD`
- `ref, err := r.Head()`
- Get commit object pointed by `ref`
- `commit, err := r.CommitObject(ref.Hash())`
- Get commit history
- `history, err := commit.History()`
- Iterate commits and print each
- `for _, c := range history {`
- ` fmt.Println(c)`
- `}`
### Advanced Functionality (go-git)
- Feature: pluggable storage system
- similar to Libgit2 backends
- default implementation: in-memory storage
- “very fast”
- example: clone into memory storage
- `r, err := git.Clone(memory.NewStorage(), nil, &git.CloneOptions{`
- ` URL: "https://github.com/go-git/go-git",`
- `})`
- Storage options example
- store references, objects, and configuration in Aerospike
- example location:
- https://github.com/go-git/go-git/tree/master/_examples/storage
- Feature: flexible filesystem abstraction
- uses go-billy `Filesystem`
- https://pkg.go.dev/github.com/go-git/go-billy/v5?tab=doc#Filesystem
- makes it easy to store files differently
- pack all files into a single archive on disk
- keep all files in-memory
- Advanced use-case: fine-tunable HTTP client
- example referenced:
- https://github.com/go-git/go-git/blob/master/_examples/custom_http/main.go
- custom client shown
- `customClient := &http.Client{`
- ` Transport: &http.Transport{ // accept any certificate (might be useful for testing)`
- ` TLSClientConfig: &tls.Config{InsecureSkipVerify: true},`
- ` },`
- ` Timeout: 15 * time.Second, // 15 second timeout`
- ` CheckRedirect: func(req *http.Request, via []*http.Request) error {`
- ` return http.ErrUseLastResponse // don't follow redirect`
- ` },`
- `}`
- override protocol handling
- `client.InstallProtocol("https", githttp.NewClient(customClient))`
- purpose: override http(s) default protocol to use custom client
- clone using new client (for `https://`)
- `r, err := git.Clone(memory.NewStorage(), nil, &git.CloneOptions{URL: url})`
### Further Reading (go-git)
- Scope note
- full treatment outside scope of the book
- API documentation
- https://pkg.go.dev/github.com/go-git/go-git/v5
- Usage examples
- https://github.com/go-git/go-git/tree/master/_examples
## Dulwich
- What it is
- pure-Python Git implementation: Dulwich
- Project hosting / site
- https://www.dulwich.io/
- Goal
- interface to Git repositories (local and remote)
- does not call out to `git` directly
- uses pure Python instead
- Performance note
- optional C extensions
- significantly improve performance
- API design
- follows Git design
- separates two API levels
- plumbing
- porcelain
### Dulwich plumbing example (lower-level API)
- Goal
- access the commit message of the last commit
- Code and shown outputs
- `from dulwich.repo import Repo`
- `r = Repo('.')`
- `r.head()`
- `# '57fbe010446356833a6ad1600059d80b1e731e15'`
- `c = r[r.head()]`
- `c`
- `# <Commit 015fc1267258458901a94d228e39f0a378370466>`
- `c.message`
- `# 'Add note about encoding.\n'`
### Dulwich porcelain example (high-level API)
- Goal
- print a commit log using porcelain API
- Code and shown outputs
- `from dulwich import porcelain`
- `porcelain.log('.', max_entries=1)`
- `#commit: 57fbe010446356833a6ad1600059d80b1e731e15`
- `#Author: Jelmer Vernooij <jelmer@jelmer.uk>`
- `#Date: Sat Apr 29 2017 23:57:34 +0000`
### Further Reading (Dulwich)
- Available on official website
- API documentation
- tutorial
- many task-focused examples
- URL
- https://www.dulwich.io/