Grit

A modular version control system with unique architecture.

Repositories: 🐙 grit

Context & Motivation

Context: Grit is a from-scratch Git implementation in Rust, providing both low-level plumbing commands (hash-object, cat-file, write-tree) and high-level porcelain commands (init, add, commit, log, status, diff, reset). The architecture follows Git’s own layered design where porcelain is composed entirely from plumbing primitives.

Motivation: We often take Git for granted, yet its ubiquity often obscures the elegance of its underlying content-addressable storage model. Building a Git implementation from scratch is a deep systems challenge—requiring a rigorous understanding of object models (blobs, trees, commits), storage formats (SHA-1, zlib), and the index structure. This project aims to achieve full format compatibility while optimizing for performance and structural clarity.

The Local Implementation

  • Current Logic: The architecture mirrors Git’s canonical layered design. The object model implements blobs, trees, and commits as typed Rust structs, utilizing for SHA-1 hashing and zlib compression via flate2. Items persist in .grit/objects/ using standard header/content formatting. The index (staging area) maintains a sorted entry list, ensuring compatibility with Git’s binary index format. Porcelain commands orchestrate plumbing primitives: grit add hashes file content and updates the index, while grit commit serializes the index into a tree and creates a commit with parent references.
  • Plumbing/porcelain split: this layering is the core architectural decision. Plumbing commands (hash-object, cat-file, write-tree, read-tree, update-ref) each do one thing and expose it via both CLI and library API. Porcelain commands (add, commit, status, log, diff, reset) are implemented exclusively by composing plumbing functions—no porcelain command accesses .grit/ directly. This means power users and scripts can automate at the plumbing level, and every porcelain command is testable by verifying its plumbing calls.
  • LRU caching strategy: an LRU cache (lru crate, default 1024 entries) sits between object read requests and disk IO. Object lookups first check the cache by SHA; cache misses decompress from disk and populate the cache. Tree objects are cached separately since they’re frequently re-read during status and diff operations (walking the tree for every status call without caching would read the same tree objects O(depth × files) times). Cache hit rates typically exceed 80% for status and log on repositories with stable working trees.
  • Bottleneck: Full Git compatibility for complex operations (merge, rebase, worktrees) requires significant additional implementation. Performance scaling for large repositories (100k+ objects) depends on efficient packfile support, which is not yet implemented—all objects are stored loose. Status checks on large working trees require efficient filesystem traversal with .gritignore pattern support.

Comparison to Industry Standards

  • My Project: High-performance, educational Git implementation emphasizing the plumbing/porcelain architecture, LRU caching for read performance, and property-based correctness testing. Achieves 3–4× speedups over Git on small-repo micro-benchmarks (init ~2.1ms at 4.1×, add ~4.1ms at 3.4×, status ~5.4ms at 3.3×, commit ~6.0ms at 4.2×) due to lower startup overhead and the Myers diff algorithm implementation.
  • Industry: Git itself is highly optimized with decades of development—packfile deltification, bitmap indices for reachability, and multi-pack-index for large repos. Libgit2 provides a C library with comprehensive API coverage. Gitoxide (another Rust implementation) targets full compatibility with extensive packfile support.
  • Gap Analysis: To approach Git’s robustness: implement packfile support (delta compression, pack-index) for storage efficiency on large repos, add merge and rebase with three-way merge algorithms, support .gritignore patterns for status filtering, implement fetch/push with Git’s smart HTTP and SSH transport protocols, and consider SHA-256 object format support (available since Git 2.29+) for forward compatibility as the ecosystem migrates away from SHA-1.

Risks & Mitigations

  • Compatibility issues: strict adherence to Git’s object and index formats with byte-level tests. Run grit operations on repositories created by Git and verify with git fsck that no corruption is introduced.
  • Performance regressions: Criterion benchmarks run in CI with statistical comparison. Alert on P95 regressions exceeding 5%. Monitor cache hit rates—a sudden drop indicates a code change is bypassing the cache.
  • Large repo degradation: without packfiles, storage grows linearly with object count. Document the current limitation and provide a roadmap for packfile support. In the interim, recommend grit for repositories under 10k objects.
  • Index corruption: validate index invariants (sorted entries, no duplicates, valid modes) on every read and write. Fail fast with a clear error rather than silently producing a malformed index that Git can’t read.