Thor, Six Weeks On: Building a Production-Grade AI Stack at Home

Fri, 19 Jun 2026 00:00:00 +0000

In the first Thor post I stood up a private AI server on an NVIDIA Jetson AGX Thor — 128GB of unified memory, a Blackwell GPU — and ran three inference backends side by side: Ollama, vLLM, and a hand-compiled TensorRT-LLM engine, all behind one OpenAI-compatible API, with a dashboard showing live tok/s. I was pretty pleased with it.

Six weeks of actually using it reframed the entire project for me. The inference was the fun part — the exploration. But the thing I’m actually building isn’t a chatbot on a Jetson. It’s a production-grade AI platform that happens to run in my house: the same disciplines I’d demand of any system serving real traffic at work — observability and scoring, caching, token-cost optimization, evaluators, guardrails, and end-to-end auditability — applied to a stack where nothing touches the cloud and I own every layer.

Ollama on JoeSindel.com

Thor, Six Weeks On: Building a Production-Grade AI Stack at Home