Projects
A mix of independent builds and selected production work — infrastructure, blockchain, and self-hosted AI.
Independent Projects
home_stack — Bare-metal Kubernetes homelab, PXE-booted Talos
A network-reprovisionable bare-metal Kubernetes cluster: Minisforum MS-01 nodes netboot Talos Linux from a PXE server I run on a Synology NAS — no installer media, every node rebuildable from the network. Built end-to-end and documented honestly, post-mortems and all.
- Provisioning as L0 — dnsmasq proxyDHCP + TFTP and an HTTP asset server in a Synology Container Manager project, coexisting with a router DHCP it can’t configure; firmware → iPXE chainload → Talos over HTTP
- Declarative, immutable cluster — Talos v1.13.2 (API-driven, no SSH), MAC-pinned static networking via
deviceSelector, control-plane VIP, Cilium withkubeProxyReplacementover Talos KubePrism (kube-proxy disabled); configs regenerate from a patch + a persistent secrets bundle - A platform, not just a cluster (Part 1) — Flux GitOps owning a layered controllers → config → apps tree; Cilium LB-IPAM + L2 announcements load-balancing onto a flat LAN (the IP pool carved out of UniFi DHCP via the controller API); Tailscale Ingress exposure; two storage classes (node-local for DBs, NFS for bulk media); the full media-automation stack + qBittorrent-over-gluetun migrated without reconfiguring a single inter-service URL; a VictoriaMetrics/VictoriaLogs/Grafana observability stack — 21 HelmReleases green, 24/24 dashboard endpoints reachable
- Honest build log — public repo with the commit history as the record, plus post-mortems of every way it bit me — KSPP cmdline, CA rotation, and an interface-naming deadlock in Part 0; the Tailscale-expose/Cilium socket-LB trap, the recursive-
fsGroupchown, and the qBittorrent auth saga in Part 1 - Next — DNS HA behind a LoadBalancer VIP; a third node for real three-node etcd quorum
Talos Linux, Kubernetes, Cilium, Flux GitOps, SOPS, Tailscale, NFS/local-path CSI, VictoriaMetrics, iPXE/dnsmasq, Synology, Helm
Thor — Self-hosted, production-grade private AI platform
Private AI platform on NVIDIA Jetson AGX Thor (Blackwell sm_110, 128 GB unified memory) — built to a production maturity bar with nothing touching the cloud. The first build stood up three concurrent inference backends (vLLM, JetPack-optimized Ollama, TensorRT-LLM) behind a unified OpenAI-compatible API, patching the TRT build system for Blackwell sm_110. Six weeks on, it consolidated onto one backend and grew the platform layers that matter.
- Production pillars (the rebuild) — a LiteLLM gateway with role aliases, retries, and virtual keys; Redis response caching; Langfuse tracing on every call for observability, scoring, and auditability; a Headroom proxy for context/token optimization; Hermes eval runners and a verified tool-calling route as guardrails
- Custom observability — aiohttp proxy in front of Ollama extracts per-generation tok/s and TTFT to Netdata via statsd; tegrastats and the LiteLLM gateway metrics integrated into live dashboards
- Hardened for real use — systemd boot persistence with proper dependency ordering, Traefik with a Tailscale cert resolver for HTTPS remote access, SOPS-managed secrets; browser voice chat from a phone over cellular without exposing anything publicly
- Next — a fixed eval set as a release gate, content-level guardrails, and cost-per-task scoring
Python, Docker, systemd, LiteLLM, Langfuse, Ollama, LangGraph, Redis, Postgres, Traefik, Tailscale, SOPS, ComfyUI, Netdata
GoNFTme — Zero-fee Web3 crowdfunding
NFT-rewarded crowdfunding platform on Base (L2). Designed and shipped end-to-end — smart contracts, frontend, wallet integration, and security review.
- Smart contracts — Solidity on OpenZeppelin patterns (ERC-721 minting, campaign management, donation flow), built with Hardhat, deployed and verified on Base Sepolia testnet via BaseScan
- Frontend — Next.js 15 + TypeScript, Wagmi for MetaMask and Coinbase Wallet, NextAuth.js for admin auth
- Security-first — Zod input validation, SonarQube audits, OWASP Top 10 review, unit + E2E test suite — defensible posture before mainnet
Solidity, OpenZeppelin, Hardhat, Base L2, Next.js 15, TypeScript, Wagmi, NextAuth.js
TF_Staticsite — Terraform module for static sites
The same module that powers this site: SSL-secured static hosting on S3 with CloudFront CDN. ACM-issued certs, secure S3 bucket policies, Route 53 wiring, sane CloudFront defaults — drop in a few variables and you’re live.
Terraform, AWS S3, CloudFront, Route 53, ACM
Selected Production Work
Service operational-maturity scoring — Coinbase
Built an n8n workflow that pulls from GitHub, Snowflake, and Datadog to score every service on operational maturity — ILT coverage, Valkey migration status, multi-region readiness, ALB-to-service-mesh migration, Fort integration, and more — emailing a hyperlinked-evidence report every morning. Replaced manual cross-team audits and saved the team several thousand engineering hours per year.
n8n, GitHub API, Snowflake, Datadog API
Ethereum beacon-node topology redesign — Coinbase
Contributed to redesigning Ethereum staking node topology away from rigid 1:1:1 validator/beacon/execution pairings toward a one-to-many beacon node architecture — increasing redundancy per validator while reducing infrastructure footprint. Tuned bare-metal hosts (kernel, network, storage) for low-latency, high-throughput performance across multi-cloud, multi-region deployments. The difference between a healthy validator and a missed attestation is measured in milliseconds.
Ethereum consensus + execution clients, bare-metal Linux, multi-region networking
ECS → EKS migration with GitOps — Omaze
Built the Terraform foundation (EKS, VPC, Client-VPN, Flux) and implemented GitOps with Flux + ArgoCD to declaratively manage cluster scaffolding — eliminating drift and making rollouts auditable and revertible instead of click-ops. Migrated CI/CD to GitHub Actions and standardized service delivery, removing per-team release toil. Delivered ~50% AWS savings via a Reserved Instance strategy across applicable services.
AWS EKS, Terraform, Flux, ArgoCD, GitHub Actions, Client VPN
SOC compliance + IaC migration — RingDNA
Migrated all production infrastructure to IaC (Terraform + Packer) so operational state was codified and auditable — a prerequisite for SOC certification and later Series B due diligence by Goldman Sachs. Achieved SOC compliance, opening the door to enterprise deals that required it as a procurement gate. Replaced manual ops jobs with Python Lambdas; upgraded internal infrastructure to end-to-end encryption.
Terraform, Packer, Jenkins, AWS Lambda, Airflow
Earlier — Onica, Beachbody, & before
- Onica — Designed a HIPAA-compliant Rancher application stack with Jenkins/Git CI for a healthcare client; led DevOps engineers and solution architects on AWS migration and managed-services engagements across regulated and non-regulated clients
- Beachbody — Stood up Puppet across 700+ pre-production and production hosts to standardize host configuration ahead of an AWS migration; migrated release tooling to SSL + LDAP and integrated services with SSO
- Earlier — Director of IT at Think Passenger (auto-scaling AWS SaaS, SSAE SOC II audits, >$1M/yr ops cost reduction) and APM Music (ESXi consolidation, ~$400K ops cost reduction); Orchestration Engineer at ReachLocal (Puppet automation, Cloudera Hadoop ownership)
Current Focus
- Interviewing for senior SRE / infrastructure roles — open to remote-first teams that value reliability, incident discipline, and well-codified production state
- Pushing the Thor stack toward an autonomous drone perimeter-sentinel — LLaVA-13B vision-language detection on live video frames, fully on-prem
- Hardening GoNFTme’s contract + frontend for mainnet