Dec 20, 2025

What you need to know about local model tooling and the steps for setting one up yourself

20 Comments

Thank you to all the comments here and on other platforms! I've added an edit to correct myself. I realized someone could make a financial decision based on this non-empirical conclusion and I don't want that happening.

Please see edit 2 at the top of the article for my corrections.

Dec 21

Isn't this missing the obvious comparison to running the *same* model you are able to run locally but through a cloud provider?

I'd be interested in seeing see both cost and performance comparisons to running your setup through e.g. OpenRouter instead.

This is a good idea!

Dec 22Edited

Digital ocean will run GPT-OSS for you at 10 cents/million input tokens and 70 cents/million output tokens. It doesn't look like they let you use Qwen at all unless you rent a GPU instance (these are extremely expensive, just buying the hardware yourself makes sense if you're going to leave it on for more than a couple months.)

I frequently hit the 30k token limit in Qwen using Aider. I don't know if they cache old requests but I think it's usually safe to assume they don't. In that case you'd probably be spending a few dollars every day you're using a tool like this. So It's probably slightly cheaper than some of these subscription services but over a year or so buying your own hardware still wins out if you're going to use it and that's without counting things like real time editor completions (personally I use a much smaller model for these.)

One really nice thing about this too which I know everyone is aware of but rarely says is that since you have the model locally you can work totally offline. Having lived on a boat for a year and currently spending most of my time in the woods with unreliable power and internet that's pretty amazing.

EDIT: It looks like Alibaba has some kind of free tier for use with Qwen-code as well now. I'm not entirely sure how it works.

Brian Glendenning

Dec 21

I'm kind of your "Edit 2" intended audience - I used to write software for my job last century, then managed software development this century, and now that I'm retired I'm a hobbyist programmer. For me, I find the sweet spot is using good tools that support a diversity of LLM options (Zed and Opencode in my case), and then I can easily switch between various options depending on how hard I'm going to be programming that month (as a hobby some months I won't do anything, other months many 10s of hours per week, ...). When I'm very busy I'm happy to spend $100/mo on Claude (and one month I spent $200). When I'm not that busy or don't want too much AI assistance (e.g., learning something) I tend to use free or low-cost models on Openrouter or Zen. (But again, my tooling interface stays the same). I have used local models that fit on my 32GB M4 Macbook air with LM Studio (and they integrate into my tooling just fine), and while their existence and ability to run on my fairly small hardware is impressive, in practice I don't find any advantage over free/cheap externally hosted models. The thing that would upset my strategy would be if Anthropic et. al. started insisting on annual subscriptions, at present they are all happy for you to upgrade/downgrade/cancel your monthly plan at will.

Ilia Karelin

Dec 23

I’m wondering if the future will lead to more smaller models for limited tasks, but those little models will do unbelievably well on those specialized tasks. But this is an amazing content piece, Logan, loved reading it.

Rainbow Roxy

Dec 22

Thanks for writing this, it clarifies a lot. It's realy cool how transparent you are about correcting the initial hypothesis, that honesty is super valuable. I totally get what you mean about that last 10% for serious work; for teaching or personal projects, local models are amazing, but when your job depends on it, reliability and that extra edge are key.

Sung Won Chung

Dec 22

It's honestly refreshing to see someone own a mistake and not perceive it as a character assassination from the internet. Really charms me to see edits to personal essays in public. Kudos to you!

Reply (1)

Logan Thorneloe

Dec 22

Thanks! I figure we're all here to learn together including me.

anti_pattern

Dec 21

I have a similar setup, but I wanted to try Nemotron. I downloaded the 4-bit and 8-bit variants of this model without any success. Here’s the link: https://huggingface.co/mlx-community/NVIDIA-Nemotron-3-Nano-30B-A3B

Have anyone had success with this model? I can’t seem to find an instruction variant of it.

When I test it with mlx , I get things like this:

<think>The user has said "hello" which is a greeting. According to the guidelines, when the user greets, I

should use the greeting-responder agent to respond with a friendly joke. Let me use the Task tool with the

greeting-responder agent.</think>

Hello! I'm here to help you with your software engineering tasks. What would you like to work on

today?<|user|>

Reply (2)

Brian Glendenning

Dec 21

The nemotron 3 nano MLX 4-bit version works for me under LM Studio (I haven't tried it with a coding agent). I haven't used it extensively, but I asked it to explain something to me (rust borrow checking) and it gave a very thorough response with nicely formatted example code and tables. Its thinking text was: "The user asks "please explain rust borrow checking". We need to provide an explanation. Must follow policy. It's allowed. Provide concise but thorough explanation of Rust's borrow checker, concepts like ownership, borrowing, lifetimes, mutable/immutable, etc. Maybe include examples. Provide reference. No disallowed content. So proceed."

Reply (1)

anti_pattern

Dec 21

Even with LM Studio I'm having troubles. I’m getting good results until I check the “enable thinking” option, and it seems like the <think></think> tags aren’t being recognized or rendered correctly in any coding agent or LM Studio itself. I’m not keen on using it without the thinking capability. I tried your question without thinking and got normal results with the table and everything, but with thinking, this is the response.

-----

The user wants an explanation of something about Rust borrow checking. We need to explain borrow checker concepts: ownership, borrowing, lifetimes, mutable/immutable, references, etc. Provide clear explanation. Probably ask what specific aspect, but we can give general overview.

We need to be concise but thorough. Provide examples, common pitfalls, how to fix errors. Also mention patterns, &, &mut, lifetimes, move semantics. Also mention that borrow checker is part of Rust's safety guarantees.

We'll produce answer.

</think>

Rust’s Borrow Checker in a Nutshell

Rust’s borrow checker is the part of the compiler that makes sure every reference is valid for the whole time it is used.

It does this without a garbage collector or reference‑counting runtime, so the safety guarantees are enforced at compile time.

Below is a step‑by‑step walkthrough of the rules, why they exist, and how to work with them

-----

It goes on and on and produce good results but the start is always broken like this.

Logan Thorneloe

Dec 21

I haven’t but I’d like to. Definitely let me know if you get something working.

Nathan Lambert

Dec 21

I want more open models in the world, but for SWEs and other high compensation careers without extreme privacy concerns, this is an odd take.

Reply (1)

Logan Thorneloe

Dec 21

Thanks for the comment Nathan! I've added an edit at the top of the article to clarify where I went wrong.

Reply (1)

Nathan Lambert

Dec 21

Oh yeah I’m sure you know this, I’m just stopping by, at least it’s an interesting take and you actually did the work!

ToxSec

Dec 21

I've been really happy with the progress of a lot of local models, and I hope they continue to expand in 2026 because there's a lot of untapped potential and a lot of cost savings to be had. Also, much more control over your own data and privacy.

Reply (1)

Logan Thorneloe

Dec 21

Me too! I was pleasantly surprised when testing them.

Denis Loginoff

Dec 21

Nice article! Thanks also for addressing the concerns people raised.

I do wonder what sorts of tasks you have the local models do well though - I find that many folks who're in favor of using local models and talks about it in videos and blog posts, often show only how to set up a local model and perform standard benchmarks, but not real-world practical tasks they do every day with it.

It would be really amazing to see that, to make a bigger difference here!

Reply (1)

Ilia Karelin

Dec 23

This would be definitely interesting to see in the future. I’m wondering if in 5-10 years, having a local model on your computer will be just a standard thing, but we shall see.

AI for Software Engineers

[Revised] You Don’t Need to Spend $100/mo on…