Deep Dives

Deep-dive explorations of technical projects and findings.

These are not quick study guides or cheat sheets — treat them as engineering case studies. They are useful for reviewing context, motivation, tradeoffs, and bottlenecks before an interview; consulting the “Risks & Mitigations” and “Gap Analysis” sections to avoid reinventing the wheel on similar projects; and measuring personal growth over time by seeing how past decisions were framed.

AI/ML Workshop — ml , onboarding , privacy
A carefully curated set of practical, highly reproducible machine learning examples (PyTorch, Hugging Face, NumPy) featuring MPS-aware benchmarks and rigorous experiment hygiene for local hardware.
Chowist — extensibility , monitoring , tooling
A decade-spanning food discovery app migrated from Ruby Sinatra to Rails to Django. Lessons on incremental framework migrations and sustaining a single codebase through ecosystem shifts.
Grit — algorithms , extensibility , performance and +2 more
A from‑scratch Git implementation in Rust; exploring content-addressable storage, plumbing/porcelain layering, and high-performance object caching.
Mailprune — data-pipelines , monitoring , networking and +1 more
A highly effective, local-first email auditing and automated cleanup tool designed to definitively identify noisy senders and deliver actionable, strictly privacy-preserving recommendations.
Photohaul — deduplication , extensibility , media and +1 more
A robust Java-based tool engineered for seamlessly organizing and migrating extensive photo collections; featuring rigorous deduplication, automatic metadata preservation, and resumable execution.
Ragchain — ml , privacy , retrieval
A comprehensive local RAG stack (ChromaDB + Ollama) designed for strictly private, reproducible retrieval and LLM inference; heavily focusing on hybrid retrieval strategies and index versioning.
Rustoku — algorithms , performance , rust
A highly optimized Sudoku engine engineered in Rust, featuring advanced human-like techniques, multi-platform support (Python, WASM), and microsecond-level performance.
Spark Trial — data-pipelines , etl , monitoring and +1 more
An intensive end-to-end ETL processing example leveraging Apache Spark for large-scale parquet datasets; deeply focusing on strict schema handling, optimal partitioning, and reproducible aggregations.
Streaming Frameworks — data-pipelines , monitoring , streaming
A deep architectural comparison of streaming pipelines: evaluating Apache Beam's portable unified model (Java/DirectRunner) against Apache Flink's native API for stateful processing and fault tolerance.
Video Analysis — extensibility , feature-extraction , media and +1 more
An exploration of multimodal video feature extraction comparing Apple-native frameworks (Vision, AVFoundation, Core Image) against cross-platform C++/Python toolchains (OpenCV, pybind11) for ML prep.
VirtuC — algorithms , compiler , performance and +1 more
A from-scratch, Rust-implemented compiler designed for a targeted C subset that effectively emits standard LLVM IR; heavily focusing on proper AST design, semantic checking, and IR verification.