Project Jarvis

Project Jarvis is what every comic book fan wants: a personal AI assistant that knows you and can actually help at any moment. Not "here's a web search" help. Real help. The difference between Jarvis and something like Siri or Alexa is fundamental — those tools don't think. They're keyword-driven pattern matchers. That's why Siri usually sends you to Safari or hands you off to another app. It doesn't have an LLM reasoning about your request. It doesn't have the knowledge to go find information and synthesize an answer. It just matches keywords to pre-programmed actions.

Jarvis runs on local LLMs through LM Studio. The entire system — speech-to-text via Whisper, language model inference, text-to-speech via Piper — runs on a single Apple Silicon Mac without touching the internet. That was a deliberate architectural choice, and it's not primarily about privacy.

The real motivation is accessibility. If I can build a useful AI assistant that runs on consumer hardware with free, open source models, then anyone with a computer can have one. Understanding how these models behave in custom environments — what they're capable of, where they break, how to work around their limitations — gives me insight into using them effectively in any context. These models are free, they're getting smaller and more powerful every few months, and they run on hardware that's four years old.

The hardest part isn't the concept. It's getting local models that aren't smart enough for general conversation to behave correctly in a custom environment with real consequences. When an LLM has 18 tools at its disposal — including the ability to create and delete files, execute code, and manage your calendar — you need guardrails that go beyond "don't say bad things." The model doesn't understand that its environment is alive. It doesn't know the difference between a test sandbox and your actual filesystem. Without custom workspace boundaries, a confused model could navigate somewhere sensitive and delete something critical.

So the system has layered safety controls: workspace boundaries that physically prevent the model from accessing paths outside its designated area, confirmation requirements for destructive operations, and structured output validation that catches malformed tool calls before they execute. The guardrails aren't restrictions on the model's intelligence. They're environmental protections that let the model operate freely within safe bounds.

I've tested Jarvis across a range of models — Qwen 3.5, Gemma 4, Mistral, and several others. The larger models can handle impressive multi-step tasks: conducting a full research phase on a topic, creating custom tools on the fly, and chaining tool calls across different domains. But the real engineering challenge is long-term memory.

The memory system works in layers. A vector database automatically archives every interaction and its outputs. But raw archives grow fast and become expensive to search, so there's a compaction layer: a secondary, smaller LLM reads each session and produces a progressive summary. This keeps the vector store lean while preserving the important context. On top of that, there's an active keyword-driven memory system that pulls relevant past context from the vector store and injects it into the current prompt. The main LLM never has to manage its own memory. The system handles recall externally, keeping the context window focused on the current task while still having access to everything that came before.

The result is an assistant that remembers your preferences, your ongoing projects, your communication style — and can pick up where you left off across sessions without you having to re-explain the context every time. It's not perfect yet. The local models still struggle with nuanced reasoning that cloud models handle easily. But the gap is closing fast, and the architecture is ready for whatever the next generation of open source models brings.

Key Features

18 Tools

Layered Memory System

Custom Safety Guardrails

Model Agnostic