AI Doesn't Require a Supercomputer

I’ve been experimenting with building AI agents lately.

Most people think you only have two choices when it comes to LLMs: you either pay a "tax" for every prompt via a cloud API key, or you go out and buy a $5,000 rig with enough GPU power to heat your house.

I’m here to tell you that’s a false choice.

I just finished building a coding assistant using Kotlin, the Koog framework, and Ollama. I’m running a massive 480B parameter model (Qwen3 Coder) through a hybrid cloud setup.

The best part? It cost me almost nothing to get running, and I didn't have to upgrade a single piece of hardware.

Here is what I learned about building systems that actually work for you.

1. The "Name" Trap

The biggest headache wasn't the code. It was the labels.

You’ll run ollama list, see a model name, plug it into your code, and… nothing. It fails. CLI tools and APIs often speak different languages.

In management, we call this a lack of clarity. In engineering, we call it a bug. Either way, it’s a reminder: never assume the system works the way the manual says it does. Verify the connection early, or you’ll spend three hours debugging logic that wasn't broken in the first place.

2. Efficiency is a Leadership Trait

I didn't want to manage a fleet of API keys or worry about per-request billing. That’s just "political theater" for your bank account.

By using Ollama as the bridge, I kept the logic local and the data private. I used a hybrid approach—keeping the framework on my machine but letting the heavy lifting happen where the resources already existed.

It’s about leverage. You don't need to own the hardware to direct the output. You just need a clear interface.

3. Empathy for Your Hardware

We talk a lot about burnout in people, but we ignore it in systems.

If I tried to run a 480B model locally on my current laptop, it wouldn't just be slow—it would be unusable. Recognizing your constraints isn't a weakness; it’s a strategy.

I’m starting with this hybrid cloud approach now so I can ship today. As my hardware grows, I’ll migrate the models locally.

Results and retention are the same metric. If I burn out my machine (or my budget) today, I can’t deliver tomorrow.

The Build

val agent = AIAgent(
    promptExecutor = simpleOllamaAIExecutor(),
    systemPrompt = "You are a coding assistant. Be concise and helpful.",
    llmModel = LLModel(
        provider = LLMProvider.Ollama,
        id = "qwen3-coder:480b-cloud", // Watch that ID closely.
        capabilities = listOf(LLMCapability.Temperature, LLMCapability.Completion),
        contextLength = 4096L
    )
)

Why This Matters

I do this because I believe management—and engineering—is about straight talk.

We overcomplicate AI because it makes us feel like we're doing "important" work. But the most effective systems are usually the simplest ones.

You don't need a massive budget. You don't need a server farm. You just need to know how to connect the dots and get the noise out of the way.

Stop optimizing for optics. Start optimizing for momentum.

Previous
Previous

Post-Mortems That Don't Kill Trust

Next
Next

Navigating the Minefield of Expectations in Interview Take-Home Assignments