12 Comments
User's avatar
JV's avatar

Isn't this missing the obvious comparison to running the *same* model you are able to run locally but through a cloud provider?

I'd be interested in seeing see both cost and performance comparisons to running your setup through e.g. OpenRouter instead.

Expand full comment
Logan Thorneloe's avatar

This is a good idea!

Expand full comment
Brian Glendenning's avatar

I'm kind of your "Edit 2" intended audience - I used to write software for my job last century, then managed software development this century, and now that I'm retired I'm a hobbyist programmer. For me, I find the sweet spot is using good tools that support a diversity of LLM options (Zed and Opencode in my case), and then I can easily switch between various options depending on how hard I'm going to be programming that month (as a hobby some months I won't do anything, other months many 10s of hours per week, ...). When I'm very busy I'm happy to spend $100/mo on Claude (and one month I spent $200). When I'm not that busy or don't want too much AI assistance (e.g., learning something) I tend to use free or low-cost models on Openrouter or Zen. (But again, my tooling interface stays the same). I have used local models that fit on my 32GB M4 Macbook air with LM Studio (and they integrate into my tooling just fine), and while their existence and ability to run on my fairly small hardware is impressive, in practice I don't find any advantage over free/cheap externally hosted models. The thing that would upset my strategy would be if Anthropic et. al. started insisting on annual subscriptions, at present they are all happy for you to upgrade/downgrade/cancel your monthly plan at will.

Expand full comment
anti_pattern's avatar

I have a similar setup, but I wanted to try Nemotron. I downloaded the 4-bit and 8-bit variants of this model without any success. Here’s the link: https://huggingface.co/mlx-community/NVIDIA-Nemotron-3-Nano-30B-A3B

Have anyone had success with this model? I can’t seem to find an instruction variant of it.

When I test it with mlx , I get things like this:

<think>The user has said "hello" which is a greeting. According to the guidelines, when the user greets, I

should use the greeting-responder agent to respond with a friendly joke. Let me use the Task tool with the

greeting-responder agent.</think>

Hello! I'm here to help you with your software engineering tasks. What would you like to work on

today?<|user|>

Expand full comment
Brian Glendenning's avatar

The nemotron 3 nano MLX 4-bit version works for me under LM Studio (I haven't tried it with a coding agent). I haven't used it extensively, but I asked it to explain something to me (rust borrow checking) and it gave a very thorough response with nicely formatted example code and tables. Its thinking text was: "The user asks "please explain rust borrow checking". We need to provide an explanation. Must follow policy. It's allowed. Provide concise but thorough explanation of Rust's borrow checker, concepts like ownership, borrowing, lifetimes, mutable/immutable, etc. Maybe include examples. Provide reference. No disallowed content. So proceed."

Expand full comment
Logan Thorneloe's avatar

I haven’t but I’d like to. Definitely let me know if you get something working.

Expand full comment
Nathan Lambert's avatar

I want more open models in the world, but for SWEs and other high compensation careers without extreme privacy concerns, this is an odd take.

Expand full comment
Logan Thorneloe's avatar

Thanks for the comment Nathan! I've added an edit at the top of the article to clarify where I went wrong.

Expand full comment
Nathan Lambert's avatar

Oh yeah I’m sure you know this, I’m just stopping by, at least it’s an interesting take and you actually did the work!

Expand full comment
ToxSec's avatar

I've been really happy with the progress of a lot of local models, and I hope they continue to expand in 2026 because there's a lot of untapped potential and a lot of cost savings to be had. Also, much more control over your own data and privacy.

Expand full comment
Logan Thorneloe's avatar

Me too! I was pleasantly surprised when testing them.

Expand full comment
Logan Thorneloe's avatar

Thank you to all the comments here and on other platforms! I've added an edit to correct myself. I realized someone could make a financial decision based on this non-empirical conclusion and I don't want that happening.

Please see edit 2 at the top of the article for my corrections.

Expand full comment