AI Can Do Your Job - Now What? | AI for Software Engineers 77
This week: AI killed Tailwind’s business model, Apple admits Siri needs Google, Anthropic blocks the competition, and more.
Two releases this week show how far AI coding tools have come. Claude 4.5 Opus is now more accessible with higher rate limits, and Claude Code has improved its planning capabilities, spending more time on design and less on iteration and enabling enough tokens for developers to use it full-time.
The second is Ralph Wiggum, a methodology/Claude Code plug-in for terminal agents that enables them to work autonomously for hours. It breaks tasks into work items with finishing criteria, then loops until all criteria are complete. The output works according to specification.
The key that makes this work so well is periodically resetting context, tracking progress via external files rather than keeping everything in memory. This prevents the drift that happens in long-running sessions and enables brand-new agents to take stabs at a problem until it’s done.
Together, these mean a coding agent can be given a product specification in the evening, work overnight, and have code ready for you in the morning. This code is usually entirely within spec and viable for a minimum viable product or even better.
So now that AI can whip up these prototypes overnight, what does that mean for you? A few things:
Be user- and product-focused. The important parts of software engineering are still important. Understanding products and outlining requirements to fulfill them is still on the engineer (i.e. giving the requirements to Ralph as mentioned above). Studies show that teams that are product-focused are more successful when using AI developer tools than their counterparts. Iterating based on high-quality user feedback is key to maintaining an effective product-focus.
Learn to use AI tools. This should be self-evident, but there are still engineers refusing to learn them. They’re the future of software development and there’s a steep learning curve to use them effectively. If you want to take the next step toward using AI to be more productive, you should both implement and try out new AI coding methodologies and tools, such as the Ralph loop. If you want to get hands-on this week, I suggest implementing this in your work environment and giving it a go.
Get good at reviewing. I know this is the boring part of engineering, but now it’s even more important. Review well enough that you’re confident in what’s going to production and that you understand how it works. Get very good at understanding system design as I find integration with surrounding systems is where these AI coding tools fail and it’s often the most difficult to detect.
Here’s everything else you need to know from this past week.
My Picks
Standalone content worth your time:
Finding and fixing Ghostty’s largest memory leak by Mitchell Hashimoto: A deep dive into debugging Ghostty’s PageList memory leak that grew to 37 GB after 10 days. The fix involved preventing reuse of non-standard pages during scrollback pruning. A great example of methodical debugging with practical techniques like macOS VM tagging.
8 plots that explain the state of open models by Nathan Lambert: China’s open models dominate adoption, led overwhelmingly by Qwen whose top variants have more downloads than many competitors combined. Qwen also leads finetuning activity on HuggingFace, though DeepSeek dominates at very large model scales.
5 GPU performance optimization methods: An easy-to-follow explanation of five GPU optimization methods for LLMs: batching, mixed-precision (FP16), tensor/kernel fusion, memory pooling, and CUDA stream management. Practical impacts include roughly 2x memory savings with FP16.
Demystifying evals for AI agents by Anthropic: A comprehensive guide on why agent evals are harder than model evals. Autonomy, tool use, and long-horizon planning introduce external dependencies and emergent behaviors that traditional testing can’t handle. Covers strategies for realistic environments, mixing automated and human assessments, and measuring both task performance and failure modes.
No, Claude Code doesn’t need a better UI by Logan Thorneloe: I wrote about why Claude Code’s terminal-based approach is actually its strength. The terminal is standardized, scriptable, and predictable, making it ideal for automation compared with brittle GUIs. Claude can control files, apps, and any CLI- or API-driven application via text commands.
Claude Cowork brings terminal agents to everyone
Anthropic released Claude Cowork, an adaptation of Claude Code that runs in the Claude app on Mac and performs general-purpose computer tasks. This is only available to Max subscribers and only on Mac for now.
I just wrote an article about how Claude is a general-purpose computer use agent, not just a coding tool. This means you can get just about anything done you could do via the terminal by prompting Claude. I stand by the fact that the terminal is still an excellent UI that builds intuition about what you can and cannot do with Claude as you watch it work. More info on Claude’s productive capabilities in the sources below.
Source: Simon Willison on Cowork, Cowork announcement on X, Ethan Mollick on Claude Code, My article on Claude Code as a computer use agent
Anthropic restricts third-party API access amid abuse concerns
Anthropic blocked two parties from using their resources this week:
Competitors such as OpenAI and xAI, to give Anthropic a competitive advantage.
Third-party harnesses that took advantage of Claude Max subscriptions, to ensure usage rates on these subscriptions can’t be spoofed.
This caused competitors such as Codex to jump on providing usage to third-party harnesses where users previously would have used Claude models. It makes me wonder about two things: how much goodwill did Anthropic lose to save money on the spoofing and what will be the long-term impact of other tools being more accessible to users?
Source: X trending on Anthropic API restrictions
Apple partners with Google to power next-gen Siri with Gemini
Apple signed a multi-year deal to base its upcoming Foundation Models on Google’s Gemini, enabling a more personalized Siri expected later this year. All inference and customization will run on Apple silicon and Apple’s Private Cloud Compute to preserve user privacy. My understanding is that Apple’s models will be based on the same LLM technology as Google’s.
I’ve seen a lot of takes on this, but the most prominent is that Apple has admitted defeat. Instead, think of this as a business decision. Apple doesn’t have a model ready that they think will guarantee an excellent assistant experience. They use Google’s models for now to ensure they can deliver a quality product to their users and they don’t lose any ground in the smartphone market. In reality, Apple is doing quite well in AI as their silicon and hardware have become a staple for serving large models.
Source: Apple-Google Gemini partnership
AI in healthcare faces mounting scrutiny from regulators and experts
A few things happened in AI-related healthcare news this week:
Google has had to remove several AI-generated health summaries to ensure misinformation isn’t spread.
OpenAI added Health to ChatGPT, enabling a user to discuss their health and health records with ChatGPT directly in the app.
Studies show more people are using AI for self-diagnosis, with one figure showing 59% of Brits are doing so.
OpenAI claims this is to ensure accurate information is given regarding healthcare and to enable users’ health-related queries to have the context of their current health information. Many are skeptical of sharing their personal health data with ChatGPT as most queries given to ChatGPT are used for training. OpenAI has guaranteed this won’t be the case with Health in-app.
Source: Google removes misleading AI health summaries, 59% of Brits use AI for diagnosis, ChatGPT Health critique
Tailwind’s layoffs reveal how AI adoption can destroy business models
Tailwind cut 75% of its staff after AI coding agents drove the CSS framework to 75 million downloads per month while simultaneously killing 40% of site traffic. Site traffic generated conversions to paid services, and this change in revenue contributed to an 80% revenue drop. Shortly after, Google AI Studio announced it would sponsor the Tailwind project.
Tailwind is one of the most popular frontend component libraries, but AI is fundamentally changing how information is consumed and transferred, meaning business models will need to adapt as well.
Source: Tailwind layoffs, Google AI Studio sponsorship
Building reliable AI agents requires rethinking evaluation
The difficult part of agent observability is logic being shifted from code to models. This means traditional test cases fail because model output can’t be tested deterministically. This is what makes AI observability such a difficult issue.
Anthropic recently released a blog post detailing evals and what makes them so tough, including the gold standard method of testing coding, computer use, and conversational agents. One big takeaway is that evals aren’t 100% foolproof and need to be accompanied by production monitoring, A/B testing, and user feedback. I highly recommend reading Anthropic’s report linked below.
Source: Harrison Chase on traces as documentation, Anthropic on agent evals
Quickies
Malaysia and Indonesia blocked Grok after regulators found it was generating sexually explicit images, including depictions of minors. src
US job openings dropped to 7.15 million in November, the lowest in over a year, with vacancies per unemployed worker falling to 0.9. src
NVIDIA and Eli Lilly will invest up to $1 billion over five years on an AI co-innovation lab for drug discovery. src
Bose is open-sourcing SoundTouch’s API instead of bricking the speakers when cloud support ends. src
Meta’s $2 billion acquisition of Manus triggered a Chinese Ministry of Commerce review for potential export control violations. src
Gemini CLI now offers “Agent Skills” that can be installed via npm. src
Self-hosting has become practical with cheap mini PCs, Tailscale, and CLI agents like Claude Code handling setup. src
Last week
In case you missed it, here’s last week’s overview:
Thanks for reading!
Always be (machine) learning,
Logan


