Files
mapas-mentales/mindmap/Appendix B_ Embedding Git in your Applications.md

23 KiB
Raw Blame History

Appendix B: Embedding Git in your Applications

Why embed / integrate Git

  • Target audience for integration
    • Developer-focused applications
      • likely benefit from integration with source control
    • Non-developer applications
      • example: document editors
      • can benefit from version-control features
  • Why Git specifically
    • Gits model works very well for many different scenarios

Two main integration options

  • Option A: spawn a shell and call the git command-line program
  • Option B: embed a Git library into your application
  • This appendix covers
    • command-line integration
    • several of the most popular embeddable Git libraries

Command-line Git (calling the git CLI)

  • What it is
    • spawn a shell process
    • use the Git command-line tool to do the work
  • Benefits
    • canonical behavior
    • all of Gits features are supported
    • fairly easy to implement
      • most runtime environments can invoke a process with command-line arguments
  • Downsides
    • Output is plain text
      • you must parse Gits output to read progress/results
      • Gits output format can change occasionally
      • parsing can be inefficient and error-prone
    • Lack of error recovery
      • if repository is corrupted
      • or user has malformed configuration value
      • Git may refuse to perform many operations
    • Process management complexity
      • must maintain a shell environment in a separate process
      • coordinating many processes can be challenging
        • especially if multiple processes may access the same repository

Libgit2

  • What it is
    • dependency-free implementation of Git
    • focus: a nice API for use within other programs
    • website: https://libgit2.org

Libgit2 C API (whirlwind tour)

  • Example flow shown

    • Open a repository
      • git_repository *repo;
      • int error = git_repository_open(&repo, "/path/to/repository");
    • Dereference HEAD to a commit
      • git_object *head_commit;
      • error = git_revparse_single(&head_commit, repo, "HEAD^{commit}");
      • git_commit *commit = (git_commit*)head_commit;
    • Print commit properties
      • printf("%s", git_commit_message(commit));
      • const git_signature *author = git_commit_author(commit);
      • printf("%s <%s>\n", author->name, author->email);
      • const git_oid *tree_id = git_commit_tree_id(commit);
    • Cleanup
      • git_commit_free(commit);
      • git_repository_free(repo);
  • Repository opening details

    • git_repository type
      • handle to a repository with an in-memory cache
    • git_repository_open
      • simplest method when you know exact path to working directory or .git folder
    • other APIs mentioned
      • git_repository_open_ext
        • includes options for searching
      • git_clone (and friends)
        • make a local clone of a remote repository
      • git_repository_init
        • create an entirely new repository
  • Dereferencing HEAD details

    • rev-parse usage
      • uses rev-parse syntax
      • reference: “see Branch References for more on this”
    • return type
      • git_revparse_single returns a git_object*
        • represents something that exists in the repositorys Git object database
        • git_object is a “parent” type for several object kinds
        • child types share the same memory layout as git_object
          • safe to cast to the correct “child” type when appropriate
    • cast safety note in this example
      • git_object_type(commit) would return GIT_OBJ_COMMIT
      • therefore its safe to cast to git_commit*
  • Commit property access details

    • message
      • git_commit_message(commit)
    • author signature
      • git_commit_author(commit) returns const git_signature *
      • fields shown
        • author->name
        • author->email
    • tree id
      • git_commit_tree_id(commit) returns a git_oid
      • git_oid
        • Libgit2 representation for a SHA-1 hash

Patterns illustrated by the Libgit2 C sample

  • Error-code style
    • pattern: declare pointer, pass its address into a Libgit2 call
    • return value: integer error code
      • 0 = success
      • < 0 = error
  • Memory / ownership rules
    • if Libgit2 populates a pointer for you
      • you must free it
    • if Libgit2 returns a const pointer
      • you dont free it
      • it becomes invalid when the owning object is freed
  • Practical note
    • “Writing C is a bit painful.”

Language bindings (Libgit2 ecosystem)

  • Implication of “writing C is painful”
    • youre unlikely to write C when using Libgit2
    • there are language-specific bindings that make integration easier

Ruby bindings: Rugged

  • Name: Rugged

  • URL: https://github.com/libgit2/rugged

  • Example equivalent to the C code

    • repo = Rugged::Repository.new('path/to/repository')
    • commit = repo.head.target
    • puts commit.message
    • puts "#{commit.author[:name]} <#{commit.author[:email]}>"
    • tree = commit.tree
  • Why its “less cluttered”

    • error handling
      • Rugged uses exceptions
      • examples mentioned: ConfigError, ObjectError
    • resource management
      • no explicit freeing
      • Ruby is garbage-collected
  • Example: crafting a commit from scratch (Rugged)

    • Code sequence shown (with numbered markers)

      • ① create a new blob
        • blob_id = repo.write("Blob contents", :blob) ①
      • work with index
        • index = repo.index
        • index.read_tree(repo.head.target.tree)
      • ② add a new file entry
        • index.add(:path => 'newfile.txt', :oid => blob_id) ②
      • build a signature hash
        • sig = {
        • :email => "bob@example.com",
        • :name => "Bob User",
        • :time => Time.now,
        • }
      • create the commit (with parameters)
        • commit_id = Rugged::Commit.create(repo,
        • :tree => index.write_tree(repo), ③
        • :author => sig,
        • :committer => sig, ④
        • :message => "Add newfile.txt", ⑤
        • :parents => repo.empty? ? [] : [ repo.head.target ].compact, ⑥
        • :update_ref => 'HEAD', ⑦
        • )
      • ⑧ look up the created commit object
        • commit = repo.lookup(commit_id) ⑧
    • Meaning of each numbered step (①–⑧)

      • ① Create a new blob
        • contains the contents of a new file
      • ② Populate index and add file
        • populate index with head commits tree
        • add the new file at path newfile.txt
      • ③ Create a new tree in the ODB
        • uses it for the new commit
      • ④ Author and committer fields
        • same signature used for both
      • ⑤ Commit message
        • "Add newfile.txt"
      • ⑥ Parents
        • when creating a commit, you must specify parents
        • uses the tip of HEAD for the single parent
        • handles empty repository case
      • ⑦ Update a ref (optional)
        • Rugged (and Libgit2) can optionally update a reference when making a commit
        • here it updates HEAD
      • ⑧ Return value / lookup
        • the return value is the SHA-1 hash of the new commit object
        • you can use it to get a Commit object
  • Performance note

    • Ruby code is clean
    • Libgit2 does heavy lifting → runs pretty fast
  • Pointer to later section

    • “If youre not a rubyist, we touch on some other bindings in Other Bindings.”

Advanced Functionality (Libgit2)

  • Out-of-core-Git capabilities
    • Libgit2 has capabilities outside the scope of core Git
  • Example capability: pluggability
    • can provide custom “backends” for several operation types
    • enables storage in a different way than stock Git
    • backend types mentioned
      • configuration
      • ref storage
      • object database
      • “among other things”

Custom backend example: object database (ODB)

  • Example source
  • Setup shown (with numbered markers)
    • ① create ODB “frontend”
      • git_odb *odb;
      • int error = git_odb_new(&odb); ①
      • meaning: initialize empty ODB frontend container for backends
    • ② initialize custom backend
      • git_odb_backend *my_backend;
      • error = git_odb_backend_mine(&my_backend, /*…*/); ②
    • ③ add backend to frontend
      • error = git_odb_add_backend(odb, my_backend, 1); ③
    • open a repository
      • git_repository *repo;
      • error = git_repository_open(&repo, "some-path");
    • ④ set repository to use custom ODB
      • error = git_repository_set_odb(repo, odb); ④
      • meaning: repo uses this ODB to look up objects
  • Note about the examples error handling
    • errors are captured but not handled
    • “We hope your code is better than ours.”

Implementing git_odb_backend_mine

  • What it is
    • constructor for your own ODB implementation
  • Requirement
    • fill in the git_odb_backend structure properly
  • Example struct layout shown
    • typedef struct {
    • git_odb_backend parent;
    • // Some other stuff
    • void *custom_context;
    • } my_backend_struct;
  • Subtle memory-layout constraint
    • my_backend_structs first member must be a git_odb_backend structure
    • ensures Libgit2 sees the memory layout it expects
  • Flexibility
    • the rest of the struct is arbitrary
    • can be as large or small as needed
  • Example initialization function responsibilities shown
    • allocate
      • backend = calloc(1, sizeof (my_backend_struct));
    • set custom context
      • backend->custom_context = …;
    • fill supported function pointers in parent
      • backend->parent.read = &my_backend__read;
      • backend->parent.read_prefix = &my_backend__read_prefix;
      • backend->parent.read_header = &my_backend__read_header;
      • // …
    • return it through output parameter
      • *backend_out = (git_odb_backend *) backend;
    • return success constant
      • return GIT_SUCCESS;
  • Where to find full signatures
    • Libgit2 source file:
      • include/git2/sys/odb_backend.h
    • which signatures to implement depends on use case

Other Bindings (Libgit2)

  • Breadth
    • bindings exist for many languages
  • Section purpose
    • show small examples using a few more complete bindings packages (as of writing)
  • Other languages mentioned as having libraries (various maturity)
    • C++
    • Go
    • Node.js
    • Erlang
    • JVM
  • Official collection of bindings
  • Common goal for the code in this section
    • return the commit message from the commit eventually pointed to by HEAD
    • “sort of like git log -1

LibGit2Sharp

  • For
    • .NET or Mono applications
  • URL
  • Characteristics
    • bindings written in C#
    • wraps raw Libgit2 calls with native-feeling CLR APIs
  • Example program (single expression)
    • new Repository(@"C:\path\to\repo").Head.Tip.Message;
  • Desktop Windows note
    • NuGet package available to get started quickly

objective-git

  • Platform context
    • Apple platform
    • likely using Objective-C as implementation language
  • URL
  • Example program outline
    • initialize repo
      • GTRepository *repo =
      • [[GTRepository alloc] initWithURL:[NSURL fileURLWithPath: @"/path/to/repo"]
      • error:NULL];
    • retrieve commit message
      • NSString *msg = [[[repo headReferenceWithError:NULL] resolvedTarget] message];
  • Swift note
    • objective-git is fully interoperable with Swift

pygit2

  • What it is
    • Python bindings for Libgit2
  • URL
  • Example program (chained calls)
    • pygit2.Repository("/path/to/repo") # open repository
    • .head # get the current branch
    • .peel(pygit2.Commit) # walk down to the commit
    • .message # read the message

Further Reading (Libgit2)

JGit

  • Purpose
    • use Git from within a Java program
  • What it is
    • fully featured Git library called JGit
    • relatively full-featured implementation of Git written natively in Java
    • widely used in the Java community
    • under the Eclipse umbrella
  • Home

Getting Set Up (JGit)

  • Multiple ways to connect project to JGit
  • Easiest path: Maven
    • add dependency snippet to <dependencies> in pom.xml
      • <dependency>
      • <groupId>org.eclipse.jgit</groupId>
      • <artifactId>org.eclipse.jgit</artifactId>
      • <version>3.5.0.201409260305-r</version>
      • </dependency>
    • version note
    • result
      • Maven automatically acquires and uses the JGit libraries you need
  • Manual dependency management
    • pre-built binaries
    • compile/run examples
      • javac -cp .:org.eclipse.jgit-3.5.0.201409260305-r.jar App.java
      • java -cp .:org.eclipse.jgit-3.5.0.201409260305-r.jar App

Plumbing (JGit)

  • Two levels of API
    • plumbing
    • porcelain
  • Terminology source: Git itself
    • porcelain APIs
      • friendly front-end for common user-level actions
      • like what a normal user would use the Git command-line tool for
    • plumbing APIs
      • interact with low-level repository objects directly

Starting point: Repository

  • Starting point for most JGit sessions
    • class: Repository
  • Creating/opening a filesystem-based repository
    • note: JGit also allows other storage models
    • Create new repository
      • Repository newlyCreatedRepo = FileRepositoryBuilder.create(new File("/tmp/new_repo/.git"));
      • newlyCreatedRepo.create();
    • Open existing repository
      • Repository existingRepo = new FileRepositoryBuilder()
      • .setGitDir(new File("my_repo/.git"))
      • .build();

FileRepositoryBuilder (finding repositories)

  • Builder style
    • fluent API
  • Helps locate a Git repository
    • whether or not your program knows exactly where its located
  • Methods/strategies mentioned
    • environment variables
      • .readEnvironment()
    • search starting from working directory
      • .setWorkTree(…).findGitDir()
    • open known .git directory
      • .setGitDir(...) (as in example)

Plumbing API: quick sampling + explanations

  • Sampling actions shown (code outline)

    • Get a reference
      • Ref master = repo.getRef("master");
    • Get object ID pointed to by reference
      • ObjectId masterTip = master.getObjectId();
    • Rev-parse
      • ObjectId obj = repo.resolve("HEAD^{tree}");
    • Load raw object contents
      • ObjectLoader loader = repo.open(masterTip);
      • loader.copyTo(System.out);
    • Create a branch
      • RefUpdate createBranch1 = repo.updateRef("refs/heads/branch1");
      • createBranch1.setNewObjectId(masterTip);
      • createBranch1.update();
    • Delete a branch
      • RefUpdate deleteBranch1 = repo.updateRef("refs/heads/branch1");
      • deleteBranch1.setForceUpdate(true);
      • deleteBranch1.delete();
    • Config
      • Config cfg = repo.getConfig();
      • String name = cfg.getString("user", null, "name");
  • Explanation: references (Ref)

    • repo.getRef("master")
      • JGit automatically grabs the actual master ref at refs/heads/master
      • returns a Ref object for reading information about the reference
    • Ref info available
      • name: .getName()
      • direct reference target object: .getObjectId()
      • symbolic reference target reference: .getTarget()
    • Ref objects also used for
      • tag refs
      • tag objects
    • Tag “peeled” concept
      • peeled = points to final target of a (potentially long) string of tag objects
  • Explanation: object IDs (ObjectId)

    • represents SHA-1 hash of an object
    • object might or might not exist in the object database
  • Explanation: rev-parse (repo.resolve(...))

    • accepts any object specifier Git understands
    • returns
      • a valid ObjectId, or
      • null
    • reference: “see Branch References”
  • Explanation: raw object access (ObjectLoader)

    • can stream contents
      • ObjectLoader.copyTo(...)
    • other capabilities mentioned
      • read type and size of object
      • return contents as a byte array
    • large object handling
      • when .isLarge() is true
      • .openStream() returns an InputStream-like object
        • reads raw data without pulling everything into memory at once
  • Explanation: creating a branch (RefUpdate)

    • create RefUpdate
    • set new object ID
    • call .update() to trigger change
  • Explanation: deleting a branch

    • requires .setForceUpdate(true)
      • otherwise .delete() returns REJECTED
      • and nothing happens
  • Explanation: config (Config)

    • get via repo.getConfig()
    • example value read
      • user.name via cfg.getString("user", null, "name")
    • config resolution behavior
      • uses repository for local configuration
      • automatically detects global and system config files
      • reads values from them as well
  • Error handling in JGit (not shown in code sample)

    • handled via exceptions
    • may throw standard Java exceptions
      • example: IOException
    • also has JGit-specific exceptions (examples)
      • NoRemoteRepositoryException
      • CorruptObjectException
      • NoMergeBaseException
  • Scope note

    • this is only a small sampling of the full plumbing API
    • many more methods/classes exist

Porcelain (JGit)

  • Why porcelain exists
    • plumbing APIs are rather complete
    • but can be cumbersome to string together for common goals
      • adding a file to the index
      • making a new commit
  • Entry point class
    • Git
    • construction shown
      • Repository repo;
      • // construct repo...
      • Git git = new Git(repo);

Porcelain command pattern (Git class)

  • Pattern
    • Git methods return a command object
    • chain method calls to set parameters
    • execute via .call()

Example: like git ls-remote

  • Credentials
    • CredentialsProvider cp = new UsernamePasswordCredentialsProvider("username", "p4ssw0rd");
  • Command chain
    • Collection<Ref> remoteRefs = git.lsRemote()
    • .setCredentialsProvider(cp)
    • .setRemote("origin")
    • .setTags(true)
    • .setHeads(false)
    • .call();
  • Output loop
    • for (Ref ref : remoteRefs) {
    • System.out.println(ref.getName() + " -> " + ref.getObjectId().name());
    • }
  • What it requests
    • tags from origin
    • not heads
  • Authentication note
    • uses a CredentialsProvider

Other commands available through Git (examples listed)

  • add
  • blame
  • commit
  • clean
  • push
  • rebase
  • revert
  • reset

Further Reading (JGit)

go-git

  • When to use
    • integrate Git into a service written in Golang
  • What it is
    • pure Go library implementation
    • no native dependencies
      • not prone to manual memory management errors
    • transparent to standard Golang performance analysis tooling
      • CPU profilers
      • memory profilers
      • race detector
      • etc.
  • Focus
    • extensibility
    • compatibility
  • Compatibility / API coverage note

Basic go-git example

  • Import
    • import "github.com/go-git/go-git/v5"
  • Clone
    • r, err := git.PlainClone("/tmp/foo", false, &git.CloneOptions{
    • URL: "https://github.com/go-git/go-git",
    • Progress: os.Stdout,
    • })

After you have a Repository instance

  • “Access information and perform mutations”
  • Example operations shown
    • Get branch pointed by HEAD
      • ref, err := r.Head()
    • Get commit object pointed by ref
      • commit, err := r.CommitObject(ref.Hash())
    • Get commit history
      • history, err := commit.History()
    • Iterate commits and print each
      • for _, c := range history {
      • fmt.Println(c)
      • }

Advanced Functionality (go-git)

  • Feature: pluggable storage system
    • similar to Libgit2 backends
    • default implementation: in-memory storage
      • “very fast”
    • example: clone into memory storage
      • r, err := git.Clone(memory.NewStorage(), nil, &git.CloneOptions{
      • URL: "https://github.com/go-git/go-git",
      • })
  • Storage options example
  • Feature: flexible filesystem abstraction
  • Advanced use-case: fine-tunable HTTP client
    • example referenced:
    • custom client shown
      • customClient := &http.Client{
      • Transport: &http.Transport{ // accept any certificate (might be useful for testing)
      • TLSClientConfig: &tls.Config{InsecureSkipVerify: true},
      • },
      • Timeout: 15 * time.Second, // 15 second timeout
      • CheckRedirect: func(req *http.Request, via []*http.Request) error {
      • return http.ErrUseLastResponse // don't follow redirect
      • },
      • }
    • override protocol handling
      • client.InstallProtocol("https", githttp.NewClient(customClient))
      • purpose: override http(s) default protocol to use custom client
    • clone using new client (for https://)
      • r, err := git.Clone(memory.NewStorage(), nil, &git.CloneOptions{URL: url})

Further Reading (go-git)

Dulwich

  • What it is
    • pure-Python Git implementation: Dulwich
  • Project hosting / site
  • Goal
    • interface to Git repositories (local and remote)
    • does not call out to git directly
    • uses pure Python instead
  • Performance note
    • optional C extensions
      • significantly improve performance
  • API design
    • follows Git design
    • separates two API levels
      • plumbing
      • porcelain

Dulwich plumbing example (lower-level API)

  • Goal
    • access the commit message of the last commit
  • Code and shown outputs
    • from dulwich.repo import Repo
    • r = Repo('.')
    • r.head()
      • # '57fbe010446356833a6ad1600059d80b1e731e15'
    • c = r[r.head()]
    • c
      • # <Commit 015fc1267258458901a94d228e39f0a378370466>
    • c.message
      • # 'Add note about encoding.\n'

Dulwich porcelain example (high-level API)

  • Goal
    • print a commit log using porcelain API
  • Code and shown outputs
    • from dulwich import porcelain
    • porcelain.log('.', max_entries=1)
      • #commit: 57fbe010446356833a6ad1600059d80b1e731e15
      • #Author: Jelmer Vernooij <jelmer@jelmer.uk>
      • #Date: Sat Apr 29 2017 23:57:34 +0000

Further Reading (Dulwich)