Discussion about this post

User's avatar
JV's avatar

Isn't this missing the obvious comparison to running the *same* model you are able to run locally but through a cloud provider?

I'd be interested in seeing see both cost and performance comparisons to running your setup through e.g. OpenRouter instead.

Expand full comment
anti_pattern's avatar

I have a similar setup, but I wanted to try Nemotron. I downloaded the 4-bit and 8-bit variants of this model without any success. Here’s the link: https://huggingface.co/mlx-community/NVIDIA-Nemotron-3-Nano-30B-A3B

Have anyone had success with this model? I can’t seem to find an instruction variant of it.

When I test it with mlx , I get things like this:

<think>The user has said "hello" which is a greeting. According to the guidelines, when the user greets, I

should use the greeting-responder agent to respond with a friendly joke. Let me use the Task tool with the

greeting-responder agent.</think>

Hello! I'm here to help you with your software engineering tasks. What would you like to work on

today?<|user|>

Expand full comment
7 more comments...

No posts

Ready for more?